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This  thesis  addresses  the  problem  of  how  to  detect  boundaries  on  the  basis  of 
motion  information  alone,  and  its  solution  is  performed  in  two  stages:  (i)  the 
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local  computations.  The  motion  boundary  estimators  have  been 
implemented  on  the  Connection  Machine,  a  large  parallel  network  of  simple, 
locally  interconnected  processors.  Further,  it  is  also  shown  that  the  visual 
flow  field  can  be  locally  estimated  as  a  by-product  of  the  early  estimation  of 
motion  boundaries,  and  a  mathematical  formulation  is  provided  to  show 
that  the  proposed  computation  of  visual  motion  is  well-posed.  The  second 
stage  consists  of  applying  and  modifying  the  Structural  Saliency  Method  by 
Sha'ashua  &  Ullman  to  extract  complete  and  unique  boundaries  from  the 
output  of  the  first  stage,  which  is  often  broadly  defined  and  can  contain  gaps. 
Results  are  presented  that  show  that  the  methods  can  successfully  segment 
complex  dynamic  images  composed  of  random-dot  patterns  or  natural 
textures.  It  is  also  shown  how  the  methods  can  be  used  in  stereopsis  and 
surface  reconstruction. 
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VISUAL  SYNOPSIS 


The  Early  Detection  of  Motion  Boundaries 


The  reader  may  perceive  the  outline  of  a 
dalmation  on  its  morning  walk,  but  most  likely 
she  or  he  will  experience  some  difficulty  because 
of  the  absence  of  distinct  intensity  edges  along 
the  dalmation's  outline. 

If,  however,  the  reader  were  to  see  a  motion 
sequence  of  the  dalmation  then  she  or  he  would 
immediately  perceive  its  outline,  even  though  the 
intensity  information  in  the  individual  frames  is 
ambiguous. 

This  thesis  addresses  the  problem  of  how  the 
outline  of  the  dalmation,  for  example,  can  be 
computed  based  on  motion  information  alone 
and  without  there  being  a  sharp  change  in 
intensity  along  its  outline. 


VISUAL  SYNOPSIS  1  ! 


•  Problem  Statement 

How  to  detect  and  group  boundaries  based  on  motion  information  alone  and  how  to 
estimate  visual  motion  early  on  ? 

*  What  can  be  computed  early  on  ?  ->  Potential  displacements 

Observation  &  Assumption 

There  is  a  great  deal  of  ambiguity  concerning  the  correct  match,  regardless  of  whether  intensities  or  edge-tokens  are 
used  as  matching  primitives  to  compute  the  potential  displacements.  Intensity  values  remain  roughly  constant  at 
corresponding  points  in  subsequent  frames,  and  we  use  a  Gaussian  matching  function,  which  depends  on  the  difference 
in  intensity  at  the  two  points  which  dehne  a  potential  displacement 


•  How  to  deal  with  the  ambiguity  of  the  potential  displacements  ?  ->  Use  the  fact  that  the 
potential  displacements  are  unimodally  distributed  inside  an  object. 

Observation  A  Assumption 

The  image  flow  field  can  be  approximated  as  locally  constant.  Hence,  neighboring  points  will  have  a  potential 
displacement  in  common  and  their  potential  displacements  will  cluster  around  a  single  point  in  a  local 
two-dimensional  histogram  that  collects  the  votes  for  die  different  possible  motions. 


•  How  to  detect  motion  boundaries  ?  — >  Look  for  bimodal  distributions  of  the  potential  displacements 

Observation 

The  potential  displacements  of  points  within  a  drcle,  whose  center  is  in  the  vicinity  of  a  motion  boimdary,  will  duster 
aroimd  two  different  points  in  a  local  two-dimensional  histogram  that  collects  the  votes  for  the  different  possible 
motions. 


VISUAL  SYNOPSIS  2 


A  motion  boundary  xvill  cause  the  local  histograms  to  he  himodal. 


•  How  to  capture  the  occurring  bimodality  ? 

Propose  five  measures  that  are  sensitive  to  a  motion  boundary.  The  left  column  shows  how  they 

are  defined  and  the  right  column  shows  their  value  along  a  scanline  ina  _ 

random-dot  image  containing  a  translating  square. 


Peak-ratio 

Ratio  of  the  height  of 
the  second  highest  and 
of  the  highest  peak  in  a 
local  histogram. 


Signal-Noise-ratio 

Ratio  of  the  votes  for  the 
highest  peak  &  its  neighbors 
and  of  the  votes  for  the 
remaining  displacements. 


Local-Support-ratio 

Ratio  of  the  highest  peak  and 
the  area  of  the  circular 
histogram  support. 


Qii'Square 

Measures  how  well  a 
Gaussian  distribution  can  be 
fitted  to  a  local  histogram. 


Kolmogorov-Smimov 
Measures  the  probability  that  two 
histograms  have  been  created  by 
the  same  population  of  motions. 


X 


VISUAL  SYNOPSIS  3  I 


The  proposed  measures  have  a  global  extremum  at  a  motion  boundary. 


•  How  to  infer  motion  boundaries  ? 

The  rows  show  the  locally  estimated  boundaries  for  the  case  of  a  complex  dynamic 
random-dot  pattern  that  contains  a  rotating  circle  and  rectangle  and  a  translating 
sqiiare.  The  following  three  approaches  can  be  used  to  infer  the  motion  boimdaries. 


peik'faiio  min  locabnippon-fwio  chi  iquf  KohnogoRW'Siniraov 

Thresholding 

For  the  different  measures  a 
threshold  can  be  derived,  above/ 
below  which  a  motion  boundary 
can  be  asserted  with  high  certainty. 


Detecting  Global  Extrema 

A  boundary  can  be  inferred  where 
the  first  derivative  of  a  measure 
crosses  zero,  its  second  derivative  is 
of  the  appropriate  sign,  and  its 
value  is  below  or  above  a 
conservative  threshold,  which  has 


been  chosen  so  that  any  extremum  below  or  above  it  can  be  safely  excluded. 


Combing  the  Measures 

The  measures  have  in  common  that 
they  have  a  global  extremum  at  a 
motion  boundary,  and  that  their 
local  extrema  anywhere  else  in  the 
image  are  weakly  correlated.  Hence, 
their  thickened  extrema  contours 
can  be  superimposed,  and  a  motion  boundary  is  inferred  where  they  all  intersect  (thickened  by  1,  2  or  3  pixels 
respectively).  This  approach  has  the  attractive  feature  that  it  does  not  require  the  setting  of  a  threshold. 


•  How  to  locally  estimate  the  visual  flow  field  early  on  ? 

The  highest  peeik  in  a  local  histogr2un  corresponds  to  the  displacement  with  the  most  loccd  support.  Hence, 
this  displacement  represents  an  estimate  of  the  image  flow  and  the  peak-ratio  reflects  how  good  the  estimate  is. 
The  computation  is  well-posed  and  consistent  with  human  psychophysics. 


Estimated  Flow  Field 


Error  Flow  Reid 


_ VISUAL  SYNOPSIS  4i 

The  estimated  motion  boundaries  can  he  broadly  defined  and  can  contain  gaps.  | 


•  How  to  extract  complete  and  unique  motion  boundaries  ? 

How  to  separate  contour  segments  belonging  to  differently  moving  objects  ? 

Observation 

Object  bovmdaiies  are  generally  smooth  and  the  flow  vectors  along  a  boundary  vary  smoothly. 

Structural  Saliency  Method 

Extracts  boundaries  and  closes  gaps  by  emplo3dng  a  simple  iterative  scheme  that  uses  an  optimization 
approach  to  measure  the  saliency  of  curves  of  line  segments  in  terms  of  their  smoothness  cmd  length. 

A  line  segment  consisting  of  three  points  is  created  only  if  the  estimated  flow  vectors  associate  with  its 
points  do  not  differ  by  more  than  two  units  in  order  to  prevent  curves  of  being  formed  that  wander  across 
boxmdaries.  Each  segment  corresponds  either  to  a  corresponding  asserted  motion  boxmdary  segment  or  to 
an  empty  area  or  gap,  called  a  virkial  segment. 

lJ®-\ 


The  optimization  problem  is  formulated  in  terms  of 
maximizing  £2(n)  over  all  curves  of  length  n  starting  flom  P. 
The  computation  becomes  Unecir  in  n  if  Q  is  an  extensible 
function.  Hence,  the  most  sedient  curve  of  length  n  at  P  will  be 
equal  to  the  maxima  over  all  segments  leaving  P  and  the 
maximal  curves  of  length  (n-1)  starting  at  the  respective 
end-points  of  these  segments. 

The  saliency  measure  is  associated  with  each  segment  and  not 
with  the  entire  curve. 


•  How  to  extract  a  unique  contour  ? 

If  the  area  in  which  curves  are  edlowed  to  form  is  broadly  defined  then  there  will  be  several  contours 
growing  alongside  each  other.  To  extract  the  most  salient  curve,  we  have  to  first  propagate  the  saliency 
value  of  the  most  salient  segment  along  the  curve  that  contributed  to  its  value.  This  is  done  iteraterively  by 
each  segment  maximizing  over  the  veilue  of  its  preferred  neighbor  and  its  own.  Thus,  the  largest  value  will 
be  propagated  along  its  curve.  Finally,  we  perform  a  non-maximal  suppression  operation,  where  each 
segment  suppresses  all  its  neighboring  segments  if  their  saliency  value  is  less  and  if  they  have  similar 
motion  estimates  associated  with  them.  Hence,  the  most  salient  contours  belonging  to  differently  moving 
objects  will  remain  alongside  each  other. 

Input 

Estimated  motion 
boundaries 


Output 

Coimected  contours  belonging  to 
diflerently  moving  objects 
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Chapter  1 
Introduction 


Introduction 


1 


It  is  a  major  goal  of  vision  to  infer  the  physical  properties  of  the  objects 
present  in  a  scene,  such  as  their  three-dimensional  structure  and  motion 
in  space.  An  essential  first  step  towards  this  goal  is  the  segmentation  of 
the  image  into  regions  that  are  likely  to  correspond  to  different  objects. 

This  early  segmentation  can  be  used  to  guide  and  substantially 
facilitate  the  further  processing  of  the  image.  Firstly,  it  provides  the 
boundary  conditions  required  by  many  early  vision  modules,  such  as 
optical  flow,  stereopsis,  shape  from  shading,  and  surface  reconstruction. 
For  example,  many  models  for  these  processes  assume  that  the  visible 
surfaces  are  generally  smooth  [14,15,17,19,28,40].  Without  prior 
knowledge  of  the  boundaries,  however,  these  computations  tend  to 
impose  the  smoothness  assumption  across  boimdaries,  leading  to  error  in 
the  computed  motion,  stereo  and  3-D  shape  [15,17,40].  Secondly, 
botmdaries  are  ideal  for  integrating  information  provided  by  the  different 
early  vision  modules  [12].  Thirdly,  the  early  detection  of  boundaries 
provides  the  input  to  visual  routines  that  establish  higher-order  shape 
properties  and  spatial  relations  among  entities  in  the  image  [44].  These 
processes  can  focus  the  attention  of  higher-level  modules  on  the  edges  of 
interest  in  a  scene  and  they  can  preferentially  allocate  processing 
resources  to  these  structures  of  interest.  Fourthly,  early  segmentation 
provides  the  critical  input  to  recognition  processes,  since  salient  and 
grouped  edges  greatly  reduce  the  combinatorial  problem  facing  the 
recognition  methods,  which  often  depend  on  the  number  of  edge 
primitives  having  to  be  examined. 

Hence,  a  key  problem  of  early  vision  is  the  detection  of  boimdaries. 
This  problem,  however,  is  difficult  because  the  only  information 
available  is  a  large  array  of  intensity  measurements.  Likewise,  detection 
of  boundaries  from  early  2-D  or  3-D  representations  is  difficult  because 
they  are  often  sparse,  noisy  and  inaccurate,  especially  in  the  vicinity  of 
object  botmdaries. 
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1.3  The  Difficulties 

The  fundamental  problem  that  arises  in  the  computation  of  motion  and 
its  boundaries  is  that  the  movement  of  elements  in  an  image  is  not  given 
directly.  It  has  to  be  computed  from  more  elementary  measurements.  All 
we  are  given  initially  are  the  temporal  changes  of  the  intensity  values  at 
each  image  point,  which  allow  us  only  to  compute  the  flow  component 
in  the  direction  of  the  image  gradient  due  to  the  aperture  problem  [23]. 

One  possible  solution  to  this  problem  is  to  compute  the  flow  field  and 
its  boundaries  simultaneously,  using  for  example  a  Markov  Random 
Field  model  and  its  line  processes  [13,18,24].  These  time  consuming 
schemes  would  be  greatly  facilitated  if  the  boimdaries  are  either  already 
known  or  at  least  estimated.  A  more  common  approach  is  to  compute  the 
image  flow  field  first  and  then  to  detect  motion  boundaries.  This 
approach  has  several  inherent  difficulties  which  will  be  discussed  now. 

The  methods  for  computing  visual  motion  fall  in  two  classes; 
intensity-based  and  token-matching  schemes.  Intensity-based  methods 
have  to  integrate  the  local  motion  measurements  due  to  the  aperture 
problem  [23].  This  integration  problem  is  commonly  solved  by  assuming 
that  the  image  flow  field  varies  smoothly  in  the  image  [2,15,17,27,28].  This 
constraint  is  valid  everywhere  except  at  object  boundaries.  Because  of 
this,  considerable  error  will  occur  in  the  vicinity  of  object  boundaries 
[17,43].  A  further  problem  is  that  the  computed  flow  field  is  often  noisy 
and  inaccurate  due  to  error  in  the  initial  motion  measurements.  As  a 
consequence,  edge  detectors  that  locate  sharp  changes  in  the  components 
of  the  computed  image  flow  field  will  detect  many  incorrect  motion 
boimdaries  [15]. 

Token-matching  schemes  have  to  solve  the  difficult  correspondence 
problem  in  order  to  compute  motion,  and  they  usually  produce  a  sparse 
flow  field  [42].  Such  a  flow  field  needs  to  be  smoothly  interpolated  so  that 
edge  detectors  can  be  applied  to  locate  the  motion  boimdaries.  Without 
the  knowledge  of  the  boundaries,  however,  the  interpolation  scheme  will 
cause  the  motion  boundaries  to  be  smoothed  over  to  such  a  degree  that  it 
may  become  impossible  to  recover  them,  or  the  ones  that  can  still  be 
detected  by  the  edge  operator  will  be  poorly  localized. 
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To  sununarize,  both  classes  of  methods  for  computing  visual  motion 
do  not  provide  an  image  flow  field  from  which  boundaries  can  be 
detected  eeisily  and  reliably.  The  computation  of  motion  and  the  detection 
of  motion  boundaries  is  faced  with  a  dilemma:  in  order  to  detect 
boundaries  with  existing  edge  detectors,  an  almost  error  free  and  densely 
defined  image  flow  field  is  required,  but  a  necessary  condition  for 
computing  such  a  flow  field  is  the  knowledge  of  the  boundaries  prior  to 
its  computation. 

Thus,  it  is  necessary  and  desirable  to  be  able  to  decouple  the  detection 
of  motion  boimdaries  from  the  computation  of  the  image  flow  field.  But 
what  information  other  than  the  image  flow  field  can  be  tised  to  detect 
motion  boxindaries?  Which  quantities  can  be  easily  computed  at  such  an 
early  stage  to  compute  a  xiseful  estimate  of  the  motion  boundaries  ? 

1.4  Detecting  Motion  Boundaries  Early  On 

The  early  detection  of  motion  boundaries  can  be  performed  in  two  stages: 
(i)  the  local  estimation  of  the  motion  discontinuities;  (ii)  the  extraction  of 
complete  bovmdaries  belonging  to  differently  moving  objects. 

1.4.1  The  First  Stage 

For  the  first  stage,  three  new  methods  are  developed  that  can  perform  the 
local  estimation  of  motion  boimdaries:  the  Bimodality  Tests,  the  Bi¬ 
distribution  Test  and  the  Dynamic  Occlusion  Method.  It  is  also  shown 
how  visual  motion  can  be  locally  estimated  as  a  by-product  of  the  early 
estimation  of  motion  boundaries. 

The  first  two  methods  make  use  of  the  fact  that  at  a  motion  boimdary 
certain  quantities,  which  can  be  easily  computed  early  on,  will  cluster 
around  two  different  points  in  a  local  histogram.  The  quantities  in 
question  are  (i)  the  potential  displacements  of  an  image  point,  and  (ii)  the 
flow  component  measured  in  the  direction  of  the  intensity  gradient.  The 
local  histograms  are  constructed  at  every  point  using  a  circular 
neighborhood  whose  radius  will  range  between  five  and  eight  pixels. 
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If  a  local  histogram  is  computed  in  the  vicinity  of  a  motion  boundary 
then  the  resulting  histogram  of  these  quantities  will  be  bimodal,  where 
the  two  peaks  are  of  roughly  equal  strength.  Hence,  the  Bimodality  Tests 
detect  motion  boimdaries  by  computing  the  degree  of  bimodality  present 
in  the  local  histograms.  The  Bi-distribution  Test  employs  a  non- 
parametric  statistical  test  to  detect  boundaries,  using  the  fact  that  the 
populations  of  motions  are  different  on  the  two  sides  of  a  boimdary.  The 
Dynamic  Occlusion  Method  is  based  on  the  fact  that  intensity  edges  of 
opposite  contrast,  called  thin-bars,  will  be  created  or  destroyed  in  the 
vicinity  of  a  motion  boimdary.  A  method  is  developed  that  can  locally 
compute  the  appearance  and  disappearance  of  thin-bars  in  a  way  that  is 
sufficient  to  estimate  motion  boundaries,  without  having  to  solve  a 
global  and  difficult  correspondence  problem. 

The  computation  of  the  visual  flow  field  and  the  detection  of  its 
boundaries  can  be  performed  in  parallel,  since  the  highest  peak  in  a  local 
histogram  of  the  potential  displacements  corresponds  to  the  motion  with 
the  most  local  support.  Hence,  this  displacement  represents  an  estimate 
of  the  image  flow.  The  measures  that  are  sensitive  to  degree  of  bimodality 
occurring  in  the  local  histograms  will  reflect  how  good  the  estimate  is.  A 
mathematical  formulation  is  provided  to  show  that  the  proposed 
computation  of  visual  motion  is  well-posed,  and  it  is  demonstrated  that 
the  developed  method  is  similar  to  the  local  voting  scheme  proposed  by 
Biilthoff,  Little  &  Poggio  [7].  The  approach  of  using  local  neighborhoods 
to  find  the  displacement  with  the  most  local  support  is  consistent  with 
human  psychophysics,  since  it  exhibits  several  of  the  same  "illusions" 
that  humans  perceive. 

1.4.2  The  Second  Stage 

The  pointwise  output  of  the  motion  boundary  estimators  is  often  broadly 
localized  and  it  can  contain  gaps.  The  second  stage  consists  of  applying 
and  modifying  the  Structural  Saliency  Method  developed  by  Sha'ashua  & 
Ullman  [37,451  to  extract  complete  and  unique  boundaries  from  the 
pointwise  output  of  the  first  stage.  Boundary  segments  belonging  to 
differently  moving  objects  are  separated  by  using  the  motion  estimates 
provided  by  the  first  stage  to  constrain  which  edge  segments  can  be 
formed. 


6 


Introduction 


The  Structural  Saliency  Method  employs  a  simple  iterative  network 
and  uses  an  optimization  approach  to  produce  a  "saliency  map",  which 
emphasizes  salient  locations  in  the  image.  The  saliency  of  curves  is 
measured  in  terms  of  their  smoothness  and  length,  which  is  often 
sufficient  to  perform  figure-ground  separation.  The  main  properties  of 
the  network  are:  (i)  the  computations  are  simple  and  local,  (ii)  globally 
salient  structures  emerge  with  a  small  number  of  iterations,  (iii)  there  is 
little  dependence  on  the  complexity  of  the  image,  (iv)  contours  are 
smoothed,  gaps  are  filled  in  and  linking  information  between  edge 
segments  is  provided. 

The  optimization  problem  is  formulated  in  terms  of  maximizing  a 
structural  saliency  measure  Q(n)  over  all  curves  of  length  n  starting  from 
P.  The  computation  is  linear  in  n  because  has  been  constructed  to  be  an 
extensible  function.  Hence,  the  most  salient  curve  of  length  n  at  P  will  be 
equal  to  the  maxima  over  all  segments  leaving  P  and  the  maximal  curves 
of  length  (n-1)  starting  at  the  respective  end-points  of  these  segments. 

1.5  Organization  of  the  Thesis 

Chapter  2  discusses  previous  work  on  the  detection  of  motion 
botmdaries.  Chapter  3  presents  three  new  methods  that  can  locally 
estimate  motion  boundaries  early  on:  the  Bimodality  Tests,  the  Bi- 
distribution  Test  and  the  Dynamic  Occlusion  Method.  It  is  shown  how  to 
infer  a  motion  boundary  from  the  computed  measures  and  how  the 
appropriate  thresholds  can  be  derived.  Chapter  4  shows  how  visual 
motion  can  be  locally  estimated  as  a  by-product  of  the  early  estimation  of 
motion  botmdaries.  A  mathematical  formulation  is  provided  for  the 
proposed  computation  of  visual  motion  and  it  is  demonstrated  that  the 
developed  method  is  well-posed.  Chapter  5  introduces  the  Structural 
Saliency  Method  by  Sha'ashua  &  UUman  and  shows  how  it  can  be 
modified  to  extract  complete  and  unique  boundaries  from  the  pointwise 
output  of  the  motion  botmdary  estimators,  whose  output  is  often  broadly 
localized  and  can  contain  gaps.  Chapter  6  shows  the  results  of  applying 
the  methods  to  image  sequences  composed  of  random-dot  or  natural 
textures.  Chapter  7  shows  how  the  methods  can  be  applied  in  stereopsis 
and  surface  reconstruction.  Chapter  8  provides  a  summary  and 
conclusion. 
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Previous  Work 

2.1  Introduction 

The  previous  work  on  the  detection  of  motion  boundaries  can  be 
categorized  by  making  the  following  two  distinctions.  First,  there  are  at 
least  two  ways  to  describe  what  takes  place  at  an  object  boundary  in  the 
presence  of  motion.  One  is  that  regions  of  a  more  distant  object  will,  in 
general,  either  appear  or  disappear  from  view  over  time  at  an  object 
boundary.  The  other  is  to  observe  that  if  two  adjacent  surfaces  undergo 
different  motions  or  are  separated  in  depth  then  they  will  give  rise  to  a 
motion  discontinuity  along  their  boimdary.  The  second  distinction  can  be 
further  differentiated  based  on  the  stage  at  which  the  detection  of  motion 
boundaries  is  performed  since  it  can  be  performed  either  prior  to,  simul¬ 
taneously  with  or  following  the  computation  of  the  image  flow  field. 

2.2  Detecting  Discontinuities  Prior  to  the  Computation  of  the 
Flow  Field 

Reichardt  et  al.  [34]  propose  a  method,  working  on  the  figure-ground 
discrimination  of  the  house-fly,  where  direction  selective  movement 
detectors  inhibit  flicker  detectors,  when  the  same  movement  appears  in 
the  center  emd  surround  of  the  motion  detectors.  Hence,  flicker  detectors 
with  significant  activity  indicate  the  presence  of  motion  boimdaries. 

Marr  &  Ullman  [23]  and  Hildreth  [15]  use  the  flow  component  in  the 
direction  of  the  intensity  gradient,  also  called  the  normal  flow 
component,  to  detect  motion  boundaries.  They  make  use  of  the  fact  that  if 
two  adjacent  objects  undergo  different  motions  Vi  and  V2,  then  the 
normal  flow  components,  whose  orientations  lie  between  the  directions 
of  (Vi  +  90°)  and  (Vz  +  90°)  or  (Vi-  90°)  and  (vz  -  90°),  will  change  in  sign 
across  the  botmdary  (see  Figure  3.2).  Therefore,  a  change  in  the  sign  of 
normal  flow  components  with  appropriate  and  roughly  equal  orientation 
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signals  a  motion  boundary.  This  method  is  limited  by  the  fact  that  the 
number  of  flow  components,  whose  orientations  lie  between  the  direc¬ 
tions  of  (vi  +  90®)  and  (vj  +  90°)  or  (Vt  -  90°)  and  (uj  -  90°),  will  decrease  as 
an  image  becomes  less  textured  and  the  angle  between  Vj  and  V2  becomes 
smaller.  Furthermore,  the  neighborhood,  over  which  measurements  are 
collected,  will  have  to  be  large  so  that  there  will  be  a  sufficient  number  of 
normal  flow  components,  whose  signs  can  be  compared. 

2.3  Detecting  Discontinuities  After  the  Computation  of  the  Flow 
Field 

Nakayama  et  al.  [291  propose  to  detect  boundaries  by  using  a  center- 
surround  operator  that  signals  image  flow  ditierences  between  the  center 
and  surround,  but  their  method  has  not  been  implemented  and  tested. 
Potter  [31]  employs  region  growing  tedmiques  to  group  features  of  similar 
velocity,  assuming  that  the  image  flow  field  is  due  to  translation. 
Clocksin  [9]  shows  that  object  and  depth  boundaries  give  rise  to 
discontinuities  in  the  magnitude  of  flow  created  by  an  observer 
translating  in  a  static  environment. 

For  the  more  general  case  of  unconstrained  motion,  Thompson  et  al. 
[41]  show  that  object  botmdaries  give  rise  to  discontinuities  in  the  image 
flow  field.  In  principle,  these  sharp  changes  could  be  detected  as  zero- 
crossings  in  the  Lapladan  of  the  components  of  the  flow  field.  In  a 
preceding  paper,  Thompson  et  d.  [1982]  computed  the  image  flow  field 
using  a  token-matching  method.  Because  the  resulting  flow  field  was 
sparse,  they  had  to  smoothly  interpolate  between  the  feature  points  at 
which  the  flow  field  was  defined.  Without  the  knowledge  of  the  location 
of  the  object  boimdaries,  their  interpolation  scheme  smoothed  over  the 
boundaries.  As  a  resxilt,  the  motion  boundaries  that  could  still  be  detected 
by  the  Lapladan  operator  were  poorly  localized. 

Schunck  [35]  computes  the  image  flow  field  using  a  motion  constraint 
line  dustering  algorithm.  He  assumes  that  the  flow  field  is  due  to  the 
translation  of  objects  in  the  scene  imder  orthographic  projection.  Object 
boundaries  are  detected  by  using  an  edge  detector  that  locates  the  sharp 
changes  in  the  components  of  the  fiow  field.  Schunck  utilizes  an  iterative 
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procedure  that  interleaves  the  application  of  an  edge  detector  with  a 
smoothing  of  the  computed  flow  field,  in  order  to  reduce  the  noise  that  is 
causing  the  erroneously  detected  boimdaries. 

Adiv  [1]  first  partitions  a  flow  field  into  connected  segments,  where 
each  segment  is  consistent  with  a  rigid  motion  of  a  roughly  planar 
surface.  A  global,  multipass  Hough  transform  is  used  to  determine  the 
parameters  describing  the  motion  and  the  plane.  The  segments  are  then 
grouped  under  the  hypothesis  that  they  are  created  by  a  single,  rigidly 
moving  object,  by  searching  for  the  motion  parameters  that  are 
compatible  with  all  the  segments  in  the  corresponding  group. 

Terzopoulos  [40]  proposes  to  detect  discontinuities  in  sparse  surface 
representations  by  marking  locations  where  the  thin  plate  used  to 
interpolate  between  the  sparse  data  points  has  an  inflection  point  and  its 
gradient  is  above  some  threshold.  To  overcome  the  shortcoming  that  the 
smoothing  thin  plate  tends  to  obscure  boundaries,  a  cost  is  also 
introduced  for  the  placement  of  a  boimdary,  leading  to  a  non-convex  cost 
functional  that  has  to  be  minimized. 

2.4  Detecting  Dynamic  Occlusion  After  the  Computation  of  the 
Flow  Field 

An  example  of  the  approach  that  also  detects  boundaries  after  the  flow 
field  computation,  but  uses  the  fact  that  dynamic  occlusion  occurs  at 
object  boundaries,  is  the  work  of  Mutch  &  Thompson  [26].  They  use  a 
relaxation  technique  to  compute  the  flow  field.  Areas  in  the  image  with  a 
high  percentage  of  features  that  do  not  have  a  match  in  the  previous  or 
subsequent  frame  are  identified  as  regions  that  have  appeared  or 
disappeared,  respectively. 

2.5  The  Simultaneous  Computation  of  the  Flow  Field  and  its 
Discontinuities 

Wohn  &  Waxman  [47]  suggest  a  scheme  where  the  motion  segmentation 
is  performed  by  detecting  "boimdaries  of  analytidty",  that  is  where  an 
approximation  of  the  local  flow  field  by  second  order  polynomials  breaks 
down.  The  boundaries  are  located  within  the  process  that  models  the 
local  flow  field. 
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Hutchinson,  Koch,  Luo  &  Mead  [18]  and  Gamble  &  Poggio  [12]  propose 
that  binary  line  processes,  first  introduced  in  the  Markov  Random  Field 
method  developed  by  [13],  can  signal  boimdaries.  At  locations  where  such 
a  line  process  is  set,  an  edge  is  postulated  ensuring  that  the  smoothness 
ass\imption  is  not  imposed  across  them.  The  computation  of  the  image 
flow  field  and  the  activation  of  the  binary  line  processes  is  then 
performed  so  as  to  minimize  a  non-convex  energy  hmctional. 

Hutchinson  et  al.  and  Gamble  et  al.  restrict  the  location  of  motion 
boundaries  to  coincide  with  the  location  of  intensity  edges.  This  strategy 
effectively  prevents  motion  boundaries  from  forming  at  locations  where 
no  intensity  edges  exist,  unless  strongly  suggested  by  motion  data. 
Conversely,  however,  intensity  edges  by  themselves  will  not  induce  the 
formation  of  discontinuities  in  the  absence  of  sharp  changes  in  motion. 

Hutchinson  et  al.  introduce  the  following  procedure  to  cope  with  the 
different  velocity  gradients  that  are  generally  present  in  a  scene.  The 
formation  of  lines  is  initially  strongly  penalized,  encouraging  a  smooth 
image  flow  field  everywhere  except  at  very  steep  velocity  gradients.  A 
smaller  price  has  to  be  paid  subsequently,  and  the  image  flow  field  will 
break  at  smaller  flow  gradients.  The  final  state  of  the  network  is 
independent  of  the  limiting  flow  gradient,  and  their  method  has  been 
successfully  applied  to  motion  sequences. 


_ The  Early  Estimation  of  Motion  Boundaries _ T/ 

Chapter  3 

The  Early  Estimation  of  Motion 
Boundaries 

3.1  Introduction 

In  this  chapter,  we  will  describe  three  new  methods  that  can  estimate 
motion  boundaries  at  an  early  stage  in  the  processing  of  visual  informa¬ 
tion,  using  only  motion  and  no  intensity  boundary  information.  The 
methods  make  use  of  the  following  two  facts.  First,  object  boimdaries  give 
rise  to  discontinuities  in  the  flow  field,  i.e.,  the  velocities  on  the  two  sides 
of  a  boundary  cluster  around  two  different  points  in  a  velocity  histogram. 

Second,  dynamic  occlusion  occurs  at  an  object  boxmdary  in  the  presence 
of  motion,  and  therefore  spatial  relationships  between  simple  image 
features  change  most  dramatically  in  the  vicinity  of  motion  boundaries. 

This  chapter  consists  of  four  parts.  First,  we  will  describe  the 
Bimodality  Tests  that  estimate  motion  boundaries  by  computing  the 
degree  of  bimodality  present  in  the  local  histograms  of  the  potential 
displacements  or  normal  flow  components.  Second,  we  will  introduce  an 
application  of  the  Kolmogorov-Smimov  Test  in  the  Bi-distribution  Test, 
that  detects  boimdaries  by  measuring  the  probability  that  two  histograms 
have  been  created  by  the  same  population  of  motions.  Third,  we  will 
discuss  how  to  infer  the  presence  of  a  motion  boundary  from  the 
measures  computed  by  the  Bimodality  Tests  and  the  Bi-distribution  Test. 

Fourth,  we  will  describe  the  Dynamic  Occlusion  Method  that  makes  use 
of  the  feet  that  thin-bars  are  created  or  destroyed  at  a  motion  boxmdary. 

Before  describing  the  methods  in  detail,  we  will  discuss  the  matching 
primitives  used,  and  why  either  the  local  histograms  of  the  potential  dis¬ 
placements  or  the  normal  flow  components  contain  sufficient  informa¬ 
tion  to  estimate  motion  boimdaries.  We  will  also  outline  the  constraints 
that  can  be  used  to  filter  the  local  histograms  and  how  to  handle  images 
that  contain  only  little  texture  and  are  sensitive  to  the  effects  of  noise. 


Observation 
•  Object  and 
depth  boundaries 
give  rise  to 
discontinuities  in 
the  visual  flow 
field  and  cause 
dynamic 
o^usion. 
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The  Bimodality  Tests  and  the  Bi-distribution  Test  are  flexible  in  terms  of 
the  matching  primitives  used,  since  either  intensities,  zero-crossings  or 
other  edge  features  can  be  used. 

Intensity  values  have  the  advantage  that  the  potential  displacements 
can  be  computed  at  almost  every  point.  Hence,  the  density  of  matching 
primitives  will  be  uniform  across  a  boundary,  and  the  methods  will  be 
more  robust,  because  there  will  be  more  contributors  to  the  local 
histograms.  The  intensity  values  are  also  smoothed  by  convolving  them 
with  a  Gaussian  filter  to  increase  their  reliability. 

We  are,  however,  implicitly  assuming  that  the  intensity  values  at 
corresponding  points  do  not  change  greatly,  although  they  are  sensitive 
to  noise  and,  more  importantly,  to  changes  in  illumination.  These  effects 
will  be  minor  as  long  as  there  is  sufficient  texture  in  the  image.  The  prob¬ 
lem  will  be  more  serious  in  parts  of  the  image  where  intensity  changes 
slowly.  To  account  for  these  gradual  changes  in  intensity,  we  use  a 
Gaussian  matching  function,  which  depends  on  the  difference  in 
intensity  at  the  two  points  which  define  a  particular  displacement,  to 
weigh  the  possible  displacements  of  a  point.  The  smaller  the  difference  in 
intensity,  the  greater  the  weight  that  is  assigned  to  a  partictilar 
displacement.  The  spread  of  the  Gaussian  matching  function  can  be 
chosen  to  reflect  the  estimated  noise  in  the  intensity  measurements. 


Using  zero-crossings  or  other  edge  features  as  matching  primitives  has 
the  advantage  that  they  are  more  likely  to  be  tied  to  a  physical  event  in 
the  scene,  and  are  therefore  more  stable  with  respect  to  noise  and  changes 
in  illumination^.  These  primitives,  however,  have  the  disadvantage  that 
they  tend  to  be  sparse,  and  their  density  can  be  non-uniform  across  the 
image.  In  particular,  the  less  textured  the  image,  the  greater  the  size  of  the 
histogram  neighborhood  needs  to  be  for  there  to  be  sufficient  contributors 
to  the  local  histograms.  This  increase  in  the  size  of  the  histogram 
neighborhood,  however,  can  decrease  the  robustness  of  the  developed 


^  It  has  been  noted  that  methods  superfidally  so  difierent  as  edge-based  and  intensity-based  flow 
field  computations  give  very  rimiliir  results  and  are  to  a  certain  degree  equivalent  [7]. 
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methods  because  it  increases  the  likelihood  that  the  image  flow  field 
changes  too  rapidly  over  the  spatial  support  used  to  compute  the 
histograms. 

3.1.2  Input  Representation 

The  input  representation  used  by  the  Bimodality  Tests  and  the  Bi¬ 
distribution  Test  is  a  local  histogram  constructed  at  each  image  point.  The 
matching  primitives  that  lie  within  a  circular  neighborhood  will 
contribute  either  the  match  scores  for  all  the  possible  displacements  or 
their  normal  flow  component  to  the  local  histogram.  The  radius  of  the 
spatial  support  used  to  compute  the  histograms  will  typically  range 
between  five  and  eight  pixels. 

The  local  histograms  of  the  potential  displacements  contain  sufficient 
information  to  infer  the  presence  of  motion  boundaries,  because,  in  a 
region  that  is  translating  locally,  all  the  matching  primitives  will  have 
one  potential  displacement  in  common,  namely,  the  one  which  corre¬ 
sponds  to  the  translation  of  the  region.  Thus,  there  will  be  a  single  strong 
peak  at  the  location  in  the  histogram  that  corresponds  to  the  local  transla- 
tion^.  In  the  vicinity  of  an  object  boimdary,  the  local  histogram  will  have 
two  peaks  of  roughly  equal  height  because  the  matching  primitives  in 
one  half  of  the  histogram  neighborhood  will  have  one  displacement  in 
common,  whereas  the  other  half  will  have  a  different  displacement  in 
common.  Hence,  motion  boimdaries  give  rise  to  local  histograms  that 
have  a  bimodal  distribution  (see  Figure  3.1). 

As  previously  noted,  the  local  motion  measurements  provide  only 
the  normal  flow  components.  These  components,  however,  provide 
sufficient  information  to  detect  motion  boundaries  for  the  following 
reason.  Normal  flow  components  that  have  the  same  orientation  will 
have  both  the  same  sign  and  roughly  equal  magnitude  in  a  region  that  is 
locally  translating.  If,  however,  two  adjacent  objects  move  differently 
then  the  normal  flow  components  of  most  orientations  will  have  differ¬ 
ent  magnitudes  across  the  botmdary  (see  Figure  3.2). 


•  The  potential 
displacements  of 
points  in  the 
vidnity  ofa 
motion  boundary 
will  duster 
around  two 
difierent  points 
in  a  local 
histogram, 
which  collects 
the  votes  for  the 
different  possible 
motiot». 


2  provided  the  motion  prixnitives  are  not  arranged  in  a  regular  pattern,  as  would  be  the  case  for 
an  image  composed  of  stripes,  causing  the  resulting  histogram  to  contain  ridges. 
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Figure  3.1  The  Infoimation  Provided  by  the  Potential  Displacements. 

Shows  a  1-D  slice  through  the  two-dimensional  loc^  histograms  that  collect  the 
potential  displacements  of  the  points  that  lie  within  a  circle  centered  at  the  locations  (xj, 
yo)/  (xj/  yo)»  (X3,  Yo),  respectively.  The  solid  vectors  represent  the  correct  local 
displacements  and  the  dashed  vectors  represent  the  other,  but  sptirious  potential 
displacements. 


Figure  32  The  Information  Provided  by  the  Normal  Flow  Vectors. 

(a)  and  Gj)  show  for  which  orientations  of  the  normal  flow  vector  N  the  sign  of  its 
component  will  be  positive  or  negative  with  respect  to  vj  and  V2,  respectively, 
(c)  Combines  results  of  (a)  and  (b)  and  the  textured  areas  show  for  which  orientations  the 
component  of  the  normal  flow  vector  N  will  be  of  opposite  sign  across  a  motion  boundary. 


(0 
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Hence,  a  histogram  of  the  normal  flow  components  that  lie  in  the 
same  narrow  orientation  range  will  be  bimodal  at  a  motion  boundary. 
The  distance  between  the  two  modes  will  be  a  function  of  the  angle  a 
between  the  normal  flow  vector  N  and  the  bisector  of  and  V2  as  well  as 
the  resolution  of  the  histogram.  The  smaller  the  angle  a,  the  greater  the 
distance  between  the  two  peaks  will  be.  The  resolution  of  the  histograms 
can  be  chosen  arbitrarily,  but  there  is  the  following  trade-off;  the  coarser 
the  resolution,  the  more  robust  the  histograms.  But,  the  number  of  the 
orientation  ranges  that  will  display  bimodality  at  a  motion  botmdary  will 
be  less,  and  the  flow  difference  across  a  boimdary  will  have  to  be  larger,  in 
order  for  there  to  be  two  distinct  modes  in  the  histogram.  We  will  choose 
the  resolution  to  be  equal  to  the  one  used  for  the  potential  displacements. 
This  should  ensure  that  the  histograms  will  be  robust  and  that  there  will 
be  a  sufficient  number  of  disjoint  orientation  ranges  that  are  sensitive  to 
motion  boxmdaries.  We  will  detect  motion  boundaries  by  computing  the 
local  histograms  for  a  number  of  disjoint  orientation  ranges  and 
analyzing  them,  using  the  methods  that  will  be  described  below. 

The  use  of  the  normal  flow  components  to  segment  a  scene  extends 
the  work  by  Marr  &  Ullman  [23]  and  by  Hildreth  [15]  in  two  ways.  First,  it 
uses  the  magnitude  as  well  as  the  sign  of  the  components  to  detect 
motion  boundaries.  Second,  the  flow  components  at  any  point  where  the 
matching  primitives  of  our  choice  are  defined  will  contribute  to  the  local 
histogram,  instead  of  just  the  normal  flow  components  that  can  be 
measured  along  contours. 


The  information  provided  by  the  measured  normal  flow  vector  N 
could  also  be  used  in  another  way  to  detect  motion  boundaries.  The 
normal  flow  vector  at  a  point  P  defines  a  line  q  on  which  its 
corresponding  point  P'  in  the  next  frame  has  to  lie,  (see  Figure  3.3). 
Hence,  in  a  region  that  is  locally  translating,  all  lines  defined  by  the 
normal  flow  components  will  intersect  roughly  at  the  location  in  a 
velocity  histogram  that  corresponds  to  the  local  translation  of  the  region. 
At  points  in  the  vicinity  of  a  motion  boxmdary,  the  local  histogram  will 
have  two  peaks  of  roughly  equal  height,  because  the  lines  defined  by  the 
normal  flow  components  in  one  half  of  the  neighborhood  will  intersect 
at  one  particular  point,  whereas  the  ones  from  the  other  half  will 
intersect  at  a  different  point. 


\ 

A 


/ 


16 


The  Early  Estimation  of  Motion  Boundaries 


3.1.3  Ways  to  Filter  the  Histograms 

In  this  part,  we  will  describe  the  constraints  that  can  be  used  to  remove 
some  of  the  incorrect  potential  displacements  of  a  motion  primitive.  This 
filtering  reduces  the  noise  in  the  local  histograms  and  it  sharpens  the 
peaks. 
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The  first  constraint  is  that  corresponding  matching  primitives  must 
have  the  same  sign  of  contrast,  i.e.  the  scalar  product  of  their  intensity 
gradients  must  be  positive.  Similarly,  the  angle  between  the  intensity 
gradients  at  corresponding  primitives  should  be  within  a  certain  bound 
for  small  rotations.  The  second  constraint  is  that  the  normal  flow  vector 
N  at  a  point  P  defines  a  line  (j  on  which  the  corresponding  point  in  the 
subsequent  frame  has  to  lie.  Hence,  a  rectangular  window  can  be  specified 
within  which  the  corresponding  motion  primitive  must  lie,  where  the 
dimensions  of  this  window  are  chosen  to  accoimt  for  errors  in  the 
measured  flow  components.  This  constraint  greatly  reduces  the  number 
of  potential  displacements  (see  Figure  33).  The  third  constraint  is  that  a 
match  must  lie  in  the  intersection  of  the  bands  defined  by  the  normal 
flow  components,  which  have  been  measured  at  different  scales. 


Figure  3  3  The  Normal  Row  Constraint 

The  normal  flow  component  N  at  a  point  P  defines  a  line  q  on  which  the  corresponding 
point  in  the  subsequent  frame  has  to  lie.  A  rectangular  wmdow  can  be  specified  within 
which  the  corresponding  motion  primitive  must  lie,  and  its  dimensions  are  chosen  to 
accoimt  for  measurement  errors  and  the  maxixnal  expected  displacement. 


The  Eariy  Estimation  of  Motion  Boundaries 


17 


3.1.4  Ways  to  Handle  Images  with  Sparse  Texture 


Motion  sequences  that  have  sparse  texture  are  very  sensitive  to  the  effects 
of  noise.  Hence,  the  intensity  values  at  corresponding  points  will  most 
likely  not  be  the  same,  and  the  potential  displacements  that  have  been 
computed  using  the  Gaussian  matching  function,  which  favors  constant 
intensity,  will  assign  the  highest  weight  to  the  wrong  displacements. 


•  Use  magnitude 
of  intensity 
gradient  or  its 
local  average  to 
suppress  false 
alarms  in  regions 
with  little 
texture. 


We  try  to  solve  this  problem  by,  firstly,  weighing  the  contributions  to 
the  local  histograms  based  on  the  magnitude  of  their  intensity  gradient  or 
by  allowing  points  to  contribute  only  if  their  gradient  is  above  a  certain 
threshold.  This  places  our  scheme  midway  between  area-  and  edge-based 
approaches.  Edge  locations  are  favored  because  the  gradient  is  high,  but 
other  places  contribute  as  well.  Secondly,  we  compute  the  average  of  the 
gradient  over  the  neighborhood  used  to  compute  the  histograms  and  we 
suppress  the  output  of  the  methods  that  estimate  motion  boundaries  if 
the  average  is  not  above  a  chosen  threshold 
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32  The  Bimodality  Tests 

We  will  now  present  two  methods  that  locate  motion  boundaries  by 
detecting  the  resulting  bimodality  in  the  local  histograms  computed  at  a 
boundary.  In  the  first  method,  three  measures  are  computed  that  are 
sensitive  to  the  degree  of  bimodality  in  the  histograms.  We  will  discuss 
the  assumption  of  local  translation  tlmt  imderlies  these  three  meastires, 
and  introduce  a  Gaussian  spatial  support  function  as  a  way  to  relax  this 
assumption.  The  second  method  detects  bimodality  by  applying  the  chi- 
square  test.  In  this  discussion,  we  will  consider  the  case  where  the 
potential  displacements  are  the  input  to  the  local  histograms,  but  what 
will  be  said  applies  equally  well  to  the  normal  flow  components. 

3.2.1  The  Ratio  Measures 

This  method  consists  of  three  measures  that  each  capture  and  monitor  a 
different  characteristic  of  a  motion  boundary.  The  local  histograms  must 
contain  two  modes  of  roughly  equal  height  at  a  botmdan  assuming  local 
translation.  This  is  captured  by  the  peak-ratio.  At  a  mol.  .>n  boimdary  the 
votes  will  not  just  cluster  around  the  correct  displacement,  the  "signal", 
but  will  be  more  spread  out  due  the  votes  from  the  other  side  of  the 
boundary.  This  is  measured  by  the  signal-noise-ratio.  Finally,  the 
displacement  receiving  the  most  votes  sl;^uld  receive  minimal  local 
support  at  a  motion  boimdary,  which  is  measured  by  the  local-support- 
ratio.  These  three  ratios  all  have  a  global  extremum  at  a  motion 
boundary,  and  their  local  extrema  an)rwhere  else  in  the  image  are  weakly 
correlated  with  each  other. 

The  Peak-Ratio  measures  the  degree  of  bimodality  by  comparing  the 
heights  of  the  two  highest  peaks  in  a  local  histogram.  It  is  equal  to  the 
ratio  of  the  height  of  the  second  highest  and  of  the  height  of  the  highest 
peak.  Hence,  the  height  of  the  peaks  is  used  to  represent  the  strength  of 
the  peaks.  This  is  a  reasonable  approximation  to  make  as  long  as  the  local 
flow  field  can  be  assumed  to  be  constant  over  the  spatial  support  used  to 
compute  the  local  histogram. 

When  we  compute  the  two  highest  peaks,  we  require  that  their 
respective  neighboring  displacements  received  strictly  less  votes.  This 
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ensures  that  the  two  highest  peaks  are  separated  by  at  least  two 
displacement  units  and  fiurthermore,  that  no  motion  boundaries  are 
£isserted  within  a  moving  object  whose  image  flow  field  is  composed  of 
patches  of  uniform  motion  that  differ  by  one  displacement  unit. 

The  peak-ratio  will  be  small  in  a  region  that  is  locally  translating.  This 
is  because  there  will  be  one  strong  peak  at  the  location  corresponding  to 
the  local  translation,  while  the  second  highest  peak,  which  will  be  due  to 
the  incorrect  potential  displacements,  will  be  small  in  comparison.  At  a 
boundary  the  two  highest  peaks  will  be  of  roughly  equal  height,  because 
the  matching  primitives  in  one  half  of  the  spatial  support  will  have  one 
particular  displacement  in  common,  whereas  the  matching  primitives  in 
the  other  half  will  have  another  displacement  in  common  that  receives 
the  highest  matching  score.  Thus,  the  peak-ratio  will  generally  have  a 
global  maximum  close  to  1.0  at  a  motion  boundary  (see  Figure  3.4) . 


The  Signal-Noise-Ratio  is  equal  to  the  ratio  of  the  number  of  votes  for 
the  highest  peak  and  its  neighbors  and  of  the  number  of  votes  for  the  re¬ 
maining  displacements  in  the  histogram.  In  a  region  that  is  locally  trans¬ 
lating,  all  the  points  in  the  histogram,  other  than  the  one  corresponding 
to  the  local  translation,  will  receive  some  votes  due  to  the  incorrect 
potential  displacements.  We  will  refer  to  these  votes  as  the  noise  activity 
in  the  histogram.  The  signal-noise-ratio  will  have  a  global  minima  at  a 
motion  boimdary  because  the  heights  of  the  highest  peak  and  its  neigh¬ 
bors,  the  "signal",  will  decrease,  whereas  the  noise  activity  will  increase 
due  to  the  votes  from  the  other  side  of  the  boimdary  (see  Figure  3.4). 
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The  Local-Support-Ratio  measures  how  many  of  the  contributors  to 
the  local  histogram  have  supported  the  displacement  with  the  most 
votes.  This  measure  is  equal  to  the  ratio  of  the  height  of  the  highest  peak 
and  the  maximal  possible  local  support  (which  is  equal  to  the  area  of  the 
neighborhood  used  to  compute  the  histogram,  provided  that  all  points 
are  weighted  equally,  see  also  section  3.2.2).  The  local-support-ratio  will  be 
close  to  1.0  in  a  region  that  is  locally  translating  because  almost  all  the 
points  will  have  a  potential  displacement  that  receives  the  highest 
matching  score  and  is  equal  to  the  local  translation.  It  will  have  a  global 
minimum  below  0.5  at  a  boundary,  because  at  least  half  of  the  matching 
primitives  will  not  have  a  potential  displacement  that  votes  most 
strongly  for  the  highest  peak  (see  Figure  3.4). 
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3J2J1  The  Local  Translation  Assumption  and  Ways  to  Relax  it 

The  assiimption  that  underlies  the  computation  of  the  above  three 
measures  is  that  the  visual  flow  field  is  locally  constant  over  the  spatial 
support  used  to  compute  the  local  histograms.  This  assumption  is  strictly 
only  true  for  the  projected  flow  field  of  a  3D  planar  surface  patch, 
translating  parallel  to  the  image  plane  under  orthographic  projection.  It 
is,  however,  a  satisfactory  local  assumption,  and  it  is  sufficient  to  just  use 
the  height  of  the  peaks  to  compute  the  degree  of  bimodality  present  in  the 
histograms.  It  is  also  assumed  that  the  matching  primitives  are  not 
arranged  in  a  regular  pattern;  as  would  be  the  case  for  an  image  composed 
of  stripes,  which  would  cause  the  resulting  histogram  to  contain  ridges. 

The  size  of  the  histogram  neighborhood  imposes  an  upper  limit  on 
the  magnitude  of  the  flow  field  gradient  that  can  be  tolerated,  so  that  the 
local  translation  asstimption  still  holds.  In  general,  there  is  also  the 
following  trade-off  between  the  size  of  the  histogram  neighborhood,  how 
much  the  flow  field  can  change  locally  and  the  robustness  of  the 
histogram  method:  the  smaller  the  size  of  the  histogram  neighborhood, 
the  steeper  the  slope  of  the  flow  gradient  can  be.  The  smaller  the 
neighborhood,  the  less  robust  the  three  measures,  because  there  will  be 
fewer  contributors  to  the  local  histogram.  We  employ  a  circular 
neighborhood  for  the  construction  of  the  histograms,  with  a  radius 
between  five  and  eight  pixels.  This  range  of  radii  has  proved  sufficient  to 
estimate  motion  boimdaries  reliably. 

There  are  at  least  three  ways  to  handle  the  situation  where  the 
assumption  of  local  translation  should  be  relaxed  and  the  local  flow  field 
changes  too  quickly  over  the  spatial  support  used  to  construct  the  local 
histograms.  First,  we  can  use  a  Gaussian  spatial  support  function  that 
weighs  contributors  to  a  local  histogram  less  that  eure  farther  away  from 
the  point  at  which  the  histogram  is  computed.  This  will  account  for  the 
fact  that  the  flow  vectors  at  points  farther  apart  are  less  likely  to  be  equal 
in  a  smoothly  varying  flow  field.  It  will  also  sharpen  the  response  of  the 
ratio  measrires,  as  is  shown  in  section  3.4.I.I.  Second,  the  flow  field  can  be 
"slowed  down"  by  using  a  coarser  resolution  for  the  histogram.  Third,  a 
measure  of  the  broadness  of  the  peaks  could  be  computed  and 
incorporated  in  the  analysis. 
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3^  A  Statistical  Test 

Due  to  noisy  intensity  measurements,  the  potential  displacements  of 
many  of  the  matching  primitives  receiving  the  highest  weight  may  not 
contain  the  correct  displacement.  This  could  caiise  the  peaks  to  be  broadly 
or  ill-dehned.  It  could  also  have  the  effect  that  the  second  highest  peak  is 
just  a  major  sub-peak  of  the  highest  peak.  These  concerns  lead  us  to 
consider  the  following  statistical  method. 

The  Chi-Square  Test  will  measure  how  well  a  Gaussian  distribution 
can  be  fitted  to  a  local  histogram.  Motion  boundaries  cause  the 
distribution  in  the  local  histograms  to  be  bimodal,  whereas  anywhere  else 
the  histograms  will  be  xinimodal.  Due  to  noise  and  errors  in  the  intensity 
measurements,  the  peaks  of  the  histograms  might  not  be  well  defined, 
but  their  unimodal  or  bimodal  nature  will  be  preserved.  We  estimate  the 
parameters  of  the  Gaussian  distribution  by  requiring  that  it  be  centered  at 
and  pass  through  the  highest  peak  of  the  local  histogram.  Hence,  the 
error  of  trying  to  fit  a  Gaussian  distribution  to  the  histogram  will  be 
maximal  in  the  vicinity  of  a  boundary  (see  Figure  3.4). 

33  The  Bi-distribution  Test 

The  input  to  this  method  is  also  a  local  histogram  of  the  potential 
displacements  or  normal  flow  components.  The  difference,  however,  is 
that  it  attempts  to  detect  motion  boundaries  by  comparing  histograms 
computed  at  different  image  points,  rather  than  by  analyzing  the 
individual  histograms. 

3.3.1  A  Non-Parametric  Statistical  Test 

The  Kolmogorov-Smirnov  Test  measures  the  probability  that  two 
local  histograms  have  been  created  by  the  same  population  of  motions.  It 
does  this  by  computing  the  maximal  absolute  difference  between  the 
cumulative  density  functions  of  the  two  histograms.  The  Kolmogorov- 
Smirnov  measure  will  be  maximal  in  the  vicinity  of  a  motion  boimdary 
because  the  histograms  on  either  side  of  the  boundary  are  created  by 
different  populations  of  displacements  (see  Figure  3.4). 
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At  each  image  point  the  Kolmogorov-Smimov  measure  is  computed 
by  comparing  the  histograms  constructed  at  two  points,  whose  connecting 
line  passes  through  the  point  in  question,  and  which  are  separated  by 
twice  the  radius  of  the  histogram  neighborhood.  Several  orientations  of 
this  connecting  line  are  used  to  detect  motion  boundaries  of  all 
orientations.  The  Kolmogorov-Smirnov  measure  that  is  assigned  to  a 
point  is  the  maximum  of  the  measures  that  have  been  computed  for  each 
of  the  chosen  orientations. 

This  test  has  the  advantage  that  it  does  not  depend  on  the  form  of  the 
histograms  that  are  being  compared.  Also  not  a  great  deal  needs  to  be 
known  about  the  nature  of  the  two  histograms.  There  are,  however,  the 
following  limitations  and  trade-offs  when  comparing  the  histograms 
constructed  at  two  different  points:  the  more  the  spatial  supports  used  to 
compute  the  two  histograms  overlap,  the  less  the  two  histograms  will 
differ.  The  greater,  however,  the  distance  between  the  two  points,  the 
more  likely  it  will  be  that  two  histograms  have  been  created  by  different 
populations  of  motions,  although  the  two  points  might  stiU  belong  to  the 
same  object.  For  example,  if  the  Kolmogorov-Smirnov  measure  is 
computed  in  the  center  of  a  rotating  object  then  it  will  be  maximal  there, 
because  any  two  histograms  that  are  being  compared  will  have  their 
peaks  at  different  locations  (e.g.  notice  the  high  Kolmogorov-Smirnov 
measure  at  the  center  of  the  rotating  circle  in  Figure  6.1). 
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Figure  3.4  The  Developed  Measxues  to  Estimate  Motion  Boundaries. 

The  left  column  shows  the  definition  of  the  five  measiires  that  are  sensitive  to 
a  motion  boundary,  and  the  right  column  displays  their  value  along  a  scanline 
in  a  random-dot  image  containing  a  translating  square. 
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Peak-ratio 

Ratio  of  the  height  of  the 
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Signal-Noise-ratio 
Ratio  of  the  votes  for  the 
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histogram  support. 
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3.4  Inf  etring  Boundaries 

We  have  introduced  five  measures  that  are  sensitive  to  the  presence  of 
motion  boundaries,  becavise  they  each  capture  and  monitor  a  di^erent 
characteristic  of  a  motion  boimdary.  The  question  arises  of  how  and 
when  to  infer  a  motion  boimdary  so  that  few  actual  botmdaries  are  being 
missed  and  few  spurious  ones  are  being  accepted.  We  will  consider 
thresholding  and  the  detection  of  global  extrema  as  ways  to  infer  motion 
boimdaries.  We  will  also  address  how  well  the  detected  boundaries  are 
localized. 


3.4.1  Thresholds  and  their  Derivation 


For  the  Ratio  Measures,  a  threshold  can  be  derived  by  calculating  their 
expected  value  as  a  function  of  the  histogram  neighborhood  radius  r  and 
the  distance  x  from  the  boundary  at  which  the  local  histogram  is 
computed.  We  assume  that  the  correct  flow  field  is  given  and  we  consider 
shearing^  and  occluding  motion,  where  d  denotes  the  width  of  the  area 
occluded  in  the  subsequent  frame.  Hence,  the  height  of  the  two  highest 
pealcs  is  equal  to  areas  a  and  h  of  the  circular  support  used  to  compute  the 
local  histograms. 

am  in  ihe  nea  ftiine 


peak-ratio  = 


height  of  l^^highest  peak 
height  of  highest  peak 


,  ,  X  i.-  height  of  highest  peak 

local-support-ratio  -  — 2 - - c — 

maximal  local  support 


where 


ocd 


c  =  a  +  b+  occl  =  area  of  circle 

a  =  ]z  r2- jr^-  acos  -  (x  -  d  )  V -(x  -  I ,  wfcere-4<bt-«^  Sr 
b  =  r2-  acos  (^)  -  x  •  1r^-x^  ,iDhere  iswsr 


if  occl  =  0  then  peak-ratio  = 


1  -  local-support-ratio 
local-support-ratio 


^  Shearing  motion  occurs  when  the  relative  movement  between  two  objects  is  in  the  direction  of 
their  boundary  and  hence  no  occlusion  occurs. 
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For  shearing  motion  and  radii  5,  8  and  10  pixels,  the  peak-ratio  will  be 
0.34,  0.52  and  0.60,  respectively,  two  pixels  away  from  the  boimdary,  {see 
Figure  3.5),  For  occluding  motion,  the  peak-ratio  will  be  maximal  one 
pixel  away  from  the  boimdary  (for  explanation  see  section  3.4.3),  and  it 
will  be  0.23,  0.46  and  0.55,  respectively,  three  pixels  away  from  the 
boundary.  This  leads  us  to  use  a  threshold  of  0.8  for  the  peak-ratio, 
because  this  ensures  that  few  actual  boundaries  are  being  missed  and  few 
spurious  ones  are  being  accepted.  We  have  obtained  good  results  with 
this  threshold,  regardless  of  the  t5rpe  of  display  or  motion. 


Figure  35  The  Derivation  of  a  Threshold  for  the  Peak-Ratio. 

The  right  and  left  panels  show  the  expected  value  of  the  peak-ratio  for  the  case  of 
shearing  and  occluding  motion,  respectively,  and  its  vedue  has  been  computed  as  a  function 
of  the  radius  r  =  5, 8, 10  of  the  circular  neighborhood  used  to  construct  the  histogram  and  as 
a  function  of  the  distance  x  from  the  boundary  at  which  the  local  histogram  has  been 
computed.  For  the  case  of  occluding  motion,  the  width  d  of  the  area  occluded  in  the  next 
frame  is  assumed  to  be  equal  to  two. 
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Similar  graphs  can  be  computed  for  the  local-support-ratio,  (see  Figure 
3.6).  For  occluding  motion  and  radii  5,  8  and  10  pixels,  the  local-support- 
ratio  will  be  minimal  at  one  pixel  away  from  the  actual  boundary,  and  it 
will  be  0.63,  0.58  and  0.56,  respectively,  three  pixels  away  from  the 
boundary.  This  leads  us  to  use  a  threshold  that  ranges  between  0.45  and 
0.6,  where  any  vadue  below  it  will  be  considered. 
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Figure  3.6  The  Derivation  of  a  Thieahold  for  the  Local-Support-Ratio. 

The  right  and  left  panels  show  the  expected  value  of  the  local-support-ratio  for  the  case 
of  shearing  and  occluding  motion,  respectively,  and  its  value  has  been  computed  as  a 
function  of  the  radius  r  =  5,  8, 10  of  the  circular  neighborhood  used  to  construct  the 
histogram  and  as  a  function  of  the  distance  x  from  the  boundary  at  which  the  local 
histogram  has  been  computed.  For  the  case  of  occluding  motion,  the  width  d  of  the  area 
occluded  in  the  next  frame  is  assumed  to  be  equal  to  two. 

Local-support-ratio  (shearing)  Local-support-ratio  (occlusion) 


Distance  from  boundary 


Distance  from  boundary 


For  the  signal-noise-ratio,  a  threshold  can  be  derived  by  tising  the 
following  approximation.  The  signal-noise-ratio  has  been  defined  to  be 
equal  to  the  ratio  of  the  local  support  for  the  highest  peak  and  its  neigh¬ 
bors,  referred  to  as  the  "signal”,  and  the  total  number  of  votes  minus  the 
"signal".  If  we  assume  that  the  total  of  votes  is  a  multiple  of  the  area  of 
the  histogram  neighborhood^,  and  that  the  "signal"  is  a  multiple  of  the 
height  of  the  highest  peak,  then  the  signal-noise-ratio  will  be  equal  to  : 


signal-noise-ratio 


signal 

total  votes  -  signal 


if  total  votes  =  a  -  c  and  signal  =  )  ■  a  then 

signal-noise-ratio  =  — ^ - =  — - — ,  where  S 

ac  -3a  Sc  -a  3 

if  d  s  I  then 


signal-noise-ratio  = 


local-support-ra  tio 
1  -  local-support-ratio 


4 


which  is  equivalent  to  assuming  that  each  point  has  a  certain  number  of  potential  displacements 
on  the  average. 
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For  occluding  motion  and  radii  5,  8  and  10  pixels,  the  signal-noise- 
ratio  will  be  minimal  at  one  pixel  away  from  the  actual  boundary  and  it 
will  be  equal  to  0.60,  0.73  and  0.77,  respectively,  and  it  will  be  1.68, 1.38  and 
1.29,  respectively,  three  pixels  away  from  the  boundary,  (see  Figure  3.7). 
These  values  represent  the  upper  bounds  for  the  signal-noise-ratio,  and 
we  will  xise,  in  general,  a  threshold  of  0.6. 


Figure  3.7  The  Derivation  of  a  Threshold  for  the  Signal-Noise-Ratio. 

The  right  and  left  panels  show  the  expected  value  of  the  signal-noise-ratio  for  the  case  of 
shearing  and  occluding  motion,  respectively,  and  its  value  has  been  computed  as  a  function 
of  the  radius  r  =  5, 8, 10  of  the  circular  neighborhood  used  to  construct  the  histogram  and  as 
a  function  of  the  distance  x  from  the  boimdary  at  which  the  local  histogram  has  been 
computed.  For  the  case  of  occluding  motion,  the  width  d  of  the  area  occluded  in  the  next 
frame  is  assimied  to  be  equcd  to  two. 
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For  the  chi-square  and  the  Kolmogorov- Smirnov  measure,  a  confi¬ 
dence  level  can  be  derived.  For  example,  the  confidence  level  for  the 
Kolmogorov-Smirnov  measure  will  be  roughly  0.1.  This  confidence 
level,  however,  is  too  low  to  be  used  to  localize  the  motion  boundaries 
for  the  following  reason.  The  Kolmogorov-Smirnov  measure  can  be 
above  this  confidence  level  even  for  points  that  lie  in  a  translating  region 
because  the  matching  scores  of  the  potential  displacements  associated 
with  the  matching  primitives  can  be  sufficiently  different.  We  will  there¬ 
fore  use  a  threshold  between  0.4  and  0.6  to  detect  and  localize  motion 
boundaries,  and  reasonable  results  have  been  obtained  with  this  choice. 
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3.4.1.1  How  to  Sharpen  the  Response  of  the  Ratio  Measures 

If  we  use,  as  mentioned  in  section  3.2.2,  a  Gaussian  spatial  support 
function  with  sigma  5  that  weighs  contributing  points  less  that  are  farther 
away  from  the  point  at  which  the  histogram  is  computed,  then  this  will 
cause  the  response  of  the  ratio  measures  to  be  sharpened,  (see  Figure  3.8). 
The  smaller  sigma  5,  the  sharper  the  response,  and  the  next  figure  shows 
the  resulting  responses  for  sigma  5  =  5,  25  and  <»  (which  is  equivalent  to 
weighing  all  contributing  points  equally),  where  r  =  8  and  we  consider 
occluding  motion. 

Figure  3.8  Sharpening  the  Response  of  the  Ratio  Measures. 

The  right  and  left  panels  show  the  -expected  value  of  the  peak-ratio  and  local-support- 
ratio,  respectively,  if  a  Gaussian  spatial  support  function  with  sigma  =  5,  25  or  «>  is  used  to 
weigh  the  contributing  points  less  that  are  farther  away  from  the  point  at  which  the 
histogram  is  computed.  Occluding  motion  is  assumed  and  the  radius  r  of  the  circular 
neighborhood  us^  to  construct  the  histogram  is  equal  to  eight.  The  smaller  sigma,  the 
sharper  the  response  of  the  two  measures. 
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3.4^  The  Detection  of  Global  Extrema 

As  Figure  3.4  has  shown,  all  the  proposed  measures  have  a  global 
extremum  in  the  vicinity  of  a  motion  boundary.  There  are  at  least  two 
ways  in  which  the  presence  of  motion  boundaries  can  be  inferred  via 
these  global  extrema.  First,  a  boundary  can  be  inferred  where  the  first 
derivative  of  the  peak-ratio,  for  example,  is  zero,  its  second  derivative  is 
negative,  and  where  this  ratio  is  above  some  minimal  threshold.  This 
minimal  threshold  is  chosen  so  that  any  extremum  below  it  can  be  safely 
excluded. 

Second,  the  measures  have  in  common  that  they  have  a  global 
extremum  at  a  motion  boundary,  and  that  their  local  extrema  anywhere 
else  in  the  image  are  weakly  correlated  with  each  other.  Hence,  the 
extrema  contours  can  be  used  in  the  following  way  to  locate  the  motion 
boundaries,  without  having  to  use  any  thresholding.  First,  the  extrema 
contours  are  computed  by  differentiation.  These  contours  are  then 
thickened  by  some  number  of  pixels  because  the  extrema  of  the  different 
measures  are  not  perfectly  localized  and  can  be  shifted  with  respect  to 
each  other  at  a  motion  boimdary.  Finally,  these  thickened  contours  are 
superimposed,  and  a  motion  boimdary  is  inferred  where  they  all  intersect 
(see  Figure  6.2).  This  approach  of  combining  the  extrema  contours  to 
detect  boundaries  has  the  attractive  feature  that  it  does  not  require  the 
setting  of  a  threshold.  The  motion  boundaries  are  inferred  by 
corroborating  the  information  provided  by  the  different  measures,  and 
good  results  have  been  obtained. 

3.4.2.1  Hysteresis 

The  problem  with  setting  a  fixed  threshold  is  that  it  can  cause  the  detected 
boundaries  to  streak.  Streaking  occurs  when  the  peak-ratio,  for  example, 
fluctuates  above  and  below  the  threshold  of  our  choice  along  a  motion 
boundary.  To  reduce  the  likelihood  of  streaking,  the  thresholding 
approach  could  be  improved  by  using  hysteresis  [8].  We  could  use  two 
thresholds,  a  high  and  a  low  one.  A  high  threshold  of  0.9  is  chosen  so  to 
ensure  that  any  point  on  a  local  maxima  contour  of  the  peak-ratio  above 
this  threshold  is  with  a  high  probability  a  motion  boundary. 
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A  low  threshold  of  0.6  is  chosen  so  that  the  probability  is  low  that  a 
motion  boundary  is  missed.  If  any  point  of  a  local  maxima  contour  is 
above  the  high  threshold,  then  that  point  is  immediately  accepted,  as  is 
the  entire  connected  segment  of  the  contour  which  contains  the  point 
and  lies  above  the  low  threshold.  The  likelihood  of  streaking  could  there¬ 
by  be  greatly  reduced,  because  for  a*  contour  to  be  broken  it  must  now  fluc¬ 
tuate  above  the  high  and  below  the  low  threshold.  Also  the  probability 
that  false  motion  boundaries  are  marked  is  reduced  because  the  high 
threshold  can  be  raised  without  risking  streaking.  If  streaking  still  occurs 
then  these  gaps  can  be  filled  by  the  methods  introduced  in  Chap'^er  5, 

3.4.3  Localization 

The  localization  of  a  motion  boxmdary  is  affected,  firstly,  by  the  curvature 
of  the  boimdary  with  respect  to  the  size  of  the  neighborhood  used  to 
compute  the  local  histograms;  comers,  for  example,  will  get  rotmded. 
Secondly,  regions  occluded  in  the  next  frame  will  cause  the  estimated 
boimdary  to  lie  midway  between  the  location  of  the  actual  object  botmd- 
ary  in  the  first  frame  and  its  location  in  the  next  frame  (as  shown  in 
derivation  for  the  thresholds).  This  is  because  the  occluded  matching 
primitives  will  not  have  a  match  in  the  next  frame  and  only  midway 
between  the  locations  of  the  actual  object  boundary  in  the  first  and  second 
frame  are  the  consistent  contributions  from  the  two  sides  of  the  boundary 
roughly  equal.  The  detected  motion  boundary  should  however  coincide 
with  the  actual  boundary,  if  a  region  appears  next  to  it  in  the  subsequent 
frame,  because  the  matching  primitives  on  either  side  will  have  a  match 
in  the  next  frame. 

3.4.3.1  Figure-Ground  Separation 

The  fact  that  the  estimated  boundary  can  lie  midway  between  the  actual 
object  boundary  in  the  first  and  second  frame  could  be  used  to  infer  the 
side  of  a  motion  boundary  that  corresponds  to  the  occluding  object.  If  the 
order  of  the  frames  is  reversed  then  the  regions,  which  disappeared  pre¬ 
viously,  will  come  into  view  now,  and  the  estimated  boundary  will  be 
correctly  localized  there.  Similarly,  the  estimated  motion  boundaries, 
where  previously  regions  came  into  view,  will  now  be  shifted  in  the 
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direction  of  the  relative  motion  between  the  occluding  object  and  the 
background.  Hence,  we  can  compute  to  which  side  a  boundary  has  moved 
by  comparing  where  the  estimated  boimdary  happens  to  lie  with  respect 
to  the  boimdary  that  was  estimated  by  reversing  the  order  of  the  frames. 
We  refer  to  this  displacement  of  boundary  as  Vg.  We  have  to  consider  the 
velocities  on  the  two  sides  of  a  boundary,  in  order  to  be  able  to  infer 
which  side  of  the  motion  boundary  is  closer  to  the  viewer.  As  will  be 
outlined  in  the  Chapter  4,  the  highest  peak  in  the  local  histograms  of  the 
potential  displacements  estimates  the  image  flow  at  each  point.  Now,  the 
occluding  object  will  move  in  the  same  direction  as  the  motion 
boundary,  i.e.  the  scalar  product  of  their  flow  vectors  has  to  be  positive. 
Hence,  if  the  scalar  product  of  Vg  and  the  difference  vector  between  the 
velocity  to  the  right,  Vr,  and  to  the  left  of  the  boimdary,  Vi,  is  positive,  i.e. 
Vb.(Vr  -  Vj)  >  0,  then  the  occluding  object  is  to  the  right  of  the  detected 
boundary.  Similarly,  a  negative  scalar  product  implies  that  the  side  to  the 
left  of  the  detected  motion  boundary  is  closer  to  the  viewer.  If  there  is  no 
dynamic  occlusion  occurring,  then  vb  will  be  zero,  and  the  local  inference 
of  which  side  corresponds  to  the  occluding  object  becomes  difficult. 
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3.5  The  Dynamic  Occlusion  Method 

In  this  section,  we  show  how  dynamic  occlusion  can  be  iised  to  estimate 
motion  boundaries  at  a  stage  prior  to  the  computation  of  visual  motion. 
Specifically,  we  want  to  develop  a  method  that  can  locally  compute  the 
appearance  and  disappearance  of  simple  features  in  a  way  that  is 
sufficient  to  estimate  botmdaries,  without  having  to  solve  a  globed  and 
difficult  correspondence  problem. 

3.5.1  Dynamic  Occlusion  of  Thin-Bars 

Certain  spatial  relationships  between  simple  image  features  change  most 
dramatically  in  the  vicinity  of  a  boimdary  in  the  presence  of  motion.  In 
particular,  zero-crossings^  of  opposite  contrast  will  move  closer  together 
or  farther  apart.  They  may  even  disappear  or  come  into  view.  Hence, 
pairs  of  zero-crossings  of  opposite  contrast  will  be  created  or  destroyed  in 
the  vicinity  of  a  boundary.  We  will  refer  to  these  pairs  as  thin-bars 
because  they  can  correspond  to  thin  bars  of  constant  intensity  in  the 
image.  We  define  a  pair  of  zero-crossings  of  opposite  contrast  to 
constitute  a  thin-bar  if  they  are  separated  by  less  than  3  sigma,  where 
sigma  refers  to  the  spread  of  the  Gaussian  \ised  to  smooth  the  image.  The 
appearance  or  disappearance  of  the  thin-bars  can  be  used  to  construct  a 
method  that  locally  estimates  motion  botmdaries. 

When  tracking  a  thin-bar,  we  do  not  attempt  to  solve  completely  the 
correspondence  problem  since  we  will  only  check  for  the  existence  of  a 
matching  thin-bar,  instead  of  trying  to  determine  the  correct  and  unique 
match.  The  disappearance  of  a  thin-bar  will  only  be  concluded  if  no 
corresponding  thin-bar  can  be  fotmd  in  the  next  frame  that  satisfies  the 
constraints  outlined  below.  As  Figure  6.5  shows,  this  is  sufficient  to 
estimate  motion  boundaries.  The  appearance  of  a  thin-bar  is  detected  by 
using  the  fact  that  the  appearemce  of  a  thin-bar  is  equivalent  to  the 
disappearance  of  a  thin-bar,  when  the  order  of  the  frames  is  reversed. 
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Zero-crossings  ooirespond  to  sharp  changes  in  intensity  detected  by  filtering  the  unage  widi  the 
Lapladan  of  a  Gatissian. 
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A  matching  thin-bar  has  to  satisfy  the  following  constraints.  First, 
corresponding  zero-crossings  must  have  the  same  contrast,  i.e.,  the  scalar 
product  of  the  intensity  gradients  at  the  locations  of  the  zero-crossings 
must  be  positive.  Similarly,  the  angle  between  the  intensity  gradients  at 
the  locations  of  corresponding  zero-crossings  should  be  within  a  certain 
boimd  for  small  rotations.  Second,  the  direction  of  the  measured  normal 
flow  component  constrains  the  motion  of  a  zero-crossing  within  180°. 
More  specifically,  the  direction  and  magnitude  of  the  measxired  normal 
flow  component  defines  a  band  within  which  the  matching  thin-bar  has 
to  lie  (see  Figure  3.3).  The  dimensions  of  the  band  are  chosen  to  account 
for  measurement  errors.  This  constraint  reduces  greatly  the  number  of 
potentially  matching  thin-bars.  Third,  we  can  define  a  spatial  ordering  for 
a  thin  bar,  since  either  the  first  zero-crossing  will  be  to  the  right  or  left  of 
the  second  zero-crossing,  and  vica  versa.  This  spatial  relationship  or 
ordering  is  not  likely  to  change  as  a  thin-bar  moves,  because  the  two 
peirtners  are  spatially  close  and  their  flow  vectors  are  therefore  roughly 
equal  (see  Figure  3.9). 

Figure  3.9  The  Spatial  Ordering  Constraint. 

If  the  zero-crossing  with  a  negative  contrast,  z-c  t-i,  moves  with  Vi  then  the  spatial 
ordering  between  the  two  zero-crossings  of  opposite  contrast  will  remain  intact  in  Frame  2. 
But  if  it  were  to  move  with  Vi,  then  the  spatial  ordering  between  z-c  [-]  and  z-c  [+)  would  be 
violated. 


Frame! 


Matching 

Constninta 

Corresponding 

zero-oossings 

have: 

•  Same  contrast 

•  Match  lies  on 
the  liite  defined 
by  normal  flow 
component 


\ 


•  Spatial  ordering 
preserved. 


Z-C  (-]  Z-C  [+] 


34 


The  Early  Estimation  of  Motion  Boundaries 


The  Dynamic  Occlusion  Method  requires  that  the  image  be  finely 
textured,  because  otherwise  thin-bars  will  appear  or  disappear  only  in  a 
few  places.  Ftirthermore,  this  method  can  have  false  alarms,  when  a 
surface  rotates  in  depth  or,  for  perspective  projection,  when  a  plane 
moves  towards  or  away  from  the  viewer.  This  will  cause  the  distance 
between  zero-crossings  to  increase  or  decrease,  and  it  can  thereby 
accidently  create  or  destroy  thin-bars.  In  the  case  of  a  rotating  cylinder, 
dynamic  occlusion  and  effects  due  to  rotation  in  depth  are  confoimded, 
but  the  thin-bars  are  still  being  created  or  destroyed  only  in  the  vicinity  of 
the  boundary  of  the  cylinder.  Despite  these  shortcomings,  the  reason  for 
developing  this  method  has  been  to  show  that  the  dynamic  occlusion  of 
these  simple  features  can  be  computed  locally  in  a  way  that  is  sufficient  to 
estimate  boundaries  at  a  stage  prior  to  the  computation  of  visual  motion, 
without  having  to  solve  a  global  correspondence  problem  (for  results  see 
Chapter  6). 
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Chapter  4 

The  Local  Estimation  of  Visual 
Motion 


In  this  chapter  we  show  how  visual  motion  can  be  locally  estimated  as  a 
by-product  of  the  early  estimation  of  motion  boimdaries.  The  local 
histograms  of  the  potential  displacements  can  be  used  to  compute  a  dense 
image  flow  field,  because  the  local  histograms  have  their  highest  peak  at 
the  displacement  that  received  the  most  local  support.  Hence,  this 
displacement  represents  an  estimate  of  the  image  flow.  Furthermore,  the 
ratio  of  the  two  highest  peaks  or  "strongest  contenders"  reflects  how  good 
the  estimate  is.  A  low  peak-ratio  implies  a  good  estimate,  whereas  a  peak- 
ratio  close  to  one  implies  the  presence  of  a  motion  boundary  and, 
likewise,  that  the  estimated  image  flow  might  be  inaccurate. 
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It  is  worth  noting  the  following:  firstly,  the  estimated  motion  bound¬ 
aries  are  not  incorporated  in  the  computation  of  visual  motion  discussed 
here.  These  early  estimates  of  the  image  flow  field  and  its  discontinuities 
could  then  be  integrated  in  a  later  computation.  Secondly,  the  local 
estimation  of  visual  motion  will  be  difficult  in  image  regions  with  only 
little  texture,  as  is  the  case  for  the  early  estimation  of  motion  boundaries. 


Local  support  or  voting  schemes  have  been  used  by,  for  example, 
Stevens  (1977)  [39],  Fennema  &  Thompson  (1979)  [12],  Prazdny  (1984)  [33], 
Bandopadhay  &  Dutta  (1986)  [4]  and  Biilthoff,  Little  &  Poggio  (1989)  [7]  to 
compute  disparity  and  displacements  fields.  These  methods,  however,  do 
not  compute  and  analyze  the  full  histogram  of  the  possible  displacements 
to  detect  the  presence  of  boxmdaries. 

In  this  chapter  we  show  that  the  method  proposed  in  this  thesis  for 
computing  visual  motion,  using  the  local  histograms  of  the  potential 
displacements,  is  well-posed.  Furthermore,  we  show  that  the  proposed 
method  is  similar  to  the  local  voting  scheme  developed  by  Biilthoff,  Little 
&  Poggio.  The  two  methods  might  appear  to  be  different  because  of 
nomenclature  and  more  importantly  because  of  what  their  main  goal  is. 
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Biilthoff  et  al.  are  primarily  interested  in  estimating  the  image  flow  field 
and  assume  that  motion  boundaries  should  be  detected  a  later  stage, 
wh^as  we  are  primarily  interested  in  demonstrating  that  the  detection 
of  motion  boundaries  can  be  decoupled  from  the  computation  of  the 
image  flow  field  and  that  it  can  be  performed  using  no  intensity  boxmdary 
and  only  motion  information. 

Both  methods  assume  that  the  image  flow  field  can  be  approximated 
locally  as  constant  and  both  use  a  small  circular  neighborhood  at  each 
point  to  determine  the  displacement  with  the  most  votes.  The  main 
difference  is  that  the  votes  for  each  possible  displacement  is  recorded  in  a 
local  histogram  by  our  method,  whereas  B^thoff  et  al.  are  only  interested 
in  the  displacement  with  the  most  votes.  Hence,  our  method  computes  a 
more  general  representation,  which  can  be  used  to  detect  motion 
boimdaries  and  estimate  visual  motion  in  parallel.  Another  difference 
lies  in  the  comparison  function  used  to  determine  the  pointwise  match 
between  intensities  in  subsequent  frames. 

4.1  Mathematical  Fonnulation 


The  computation  of  the  visual  flow  field  is  locally  underconstrained  and 
in  order  to  make  it  well-posed  we  need  to  add  a  constraint  to  compute  the 
smoothest  flow  field  which  matches  the  data  [71.  When  the  projected 
motion  of  objects  is  small  relative  to  the  image  size,  we  can  restrict  the 
search  for  corresponding  points  to  small  regions  in  the  image.  Using  a 
formulation  similar  to  the  one  used  by  Biilthoff  et  al.  [7],  we  look  for  a 
discrete  image  flow  field  V(x,}f)  =  (u(x,y),v(x,y))  b  (-l+url+fi)  to  minimize; 


f  [i2(Et(x,y),  Et+^t(x+uAt,y+vAt)) 


(1) 


+  fi  (d^uldx^  +  d^uldy^  +  d^v/dx^  +  d^v/dy^)]  dx  dy 


where  Et(x,y)  denotes  the  image  brightness  or  intensity  at  (x,y)  at 
time  t,  £2  is  a  comparison  function  which  measures  the  pointwise  match 
between  subsequent  frames,  and  /x  denotes  the  maximal  expected 
displacement  in  the  x  and/or  y  dimension. 
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We  construct  the  image  flow  field  pointwise,  since  for  each 
displacement,  every  point  evaluates  a  comparison  hmction  at  that 
displacement,  and  it  then  sums  the  match  scores  over  the  circular 
neighborhood  Cr.  Each  point  chooses  the  displacement  with  maximal 
support  out  of  the  finite  set  of  possible  displacements.  The  resulting 
image  flow  field  is  the  union  of  these  pointwise  displacements. 

We  simplify  and  approximate  equation  (1)  by  using  the  constraint  that 
the  image  flow  field  can  be  assumed  to  be  locally  constant  in  the  small 
neighborhood  C  used  at  each  point  to  compute  the  local  support  for  the 
different  possible  displacements.  We  choose  the  neighborhood  Cr  to  be 
circular  with  a  radius  r  that  is  dependent  on  the  distance  to  the  objects  in 
the  scene  and  their  expected  size  in  the  image.  The  choice  of  fj,  depends 
on  the  maximal  expected  velocities  of  objects  in  the  scene,  their  distances 
from  the  camera,  and  the  time  separation  At  between  frames.  The  time 
separation  At  is  small  and  therefore  the  resulting  image  displacements 
will  be  small  with  respect  to  the  image  size.  Hence,  we  are  dealing  with 
short  range  motion. 

The  second-order  term  of  equation  (1)  vanishes,  because  of  the  local 
translation  assumption.  The  simplified  and  approximated  equation  (1) 
minimizes  now,  in  each  overlapping  circular  neighborhood  Cr(x,y)  with 
radius  r : 


X  £2(Et(x,y),Et+At(x+uAt,y+vAt)).  (2) 

(x,y)  B  Cr 

As  mentioned  in  section  3.1.1,  we  use  a  Gaussian  matching  function, 
which  depends  on  the  difference  in  intensity  to  measure  the  pointwise 
match  between  subsequent  frames  to  account  for  the  occurring  changes  in 
intensity.  The  smaller  the  difference  in  intensity,  the  larger  the  weight 


that  is  assigned  to  a  particular  displacement.  The  spread  fi  of  the  Gaussian 
matching  function  can  be  chosen  to  reflect  the  estimated  noise  in  the 
intensity  measurements.  Hence,  in  our  case  the  comparison  function  Q  is 
equal  to: 

£2(Et(x,y),  Et+At(x+uAt,y+vAt))  =  -  e"  ^  '  Et+^t(x+uAt,y+vAt)) 
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whereas  Biilthoff  et  al.  use  Q(Et(x,y),  Et+^t(x+uAt,y+vAt))  =  (Et(x,y)  - 
Ei+M(x+uAt,y+vAt))^. 

We  can  substitute  equation  (3)  into  equation  (2)  and  absorb  the  minus 
sign  by  turning  the  minimization  into  a  maximization.  Hence,  the  visual 
flow  vector  of  a  pixel  is  computed  by  maximizing  for  all  (u,v)  e 

^  g- ji  (Et(x,y)  -  Et+^t(x+uAt,y+vAt)f  (4) 

(x,y)  E  Cf 

The  local  neighborhoods  used  to  estimate  the  image  flow  field  are 
overlapping  from  pixel  to  pixel.  Each  pixel,  surrounded  by  its 
neighborhood  Cr  with  radius  r,  independently  chooses  the  image  flow 
vector  to  maximize  matching  in  its  neighborhood.  We  do  not  match 
intensities  directly,  since  the  presence^  of  noise  makes  the  process 
unstable.  We  rather  choose  the  displacement  whose  intensity  value 
maximizes  (4),  which  in  turn  regularizes  the  solution  of  the  matching 
computation  [71. 

4.2  Advantages  and  Relationship  to  Human  Psychophysics 

Like  the  method  by  Biilthoff  et  al.  [73,  this  way  of  estimating  the  image 
flow  field  has  several  attractive  features.  First,  noise  is  reduced  by  the 
local  neighborhoods  used  to  find  the  displacement  with  the  most  local 
support.  Second,  it  does  not  rely  on  the  numerical  precision  of 
derivatives,  making  it  therefore  more  robust.  Third,  this  approach 
computes  a  dense  image  flow  field,  removing  the  necessity  of 
interpolating  or  smoothing  the  estimated  flow  field. 

Biilthoff  et  al.  [71  have  demonstrated  that  the  approach  of  using  local 
neighborhoods  to  find  the  displacement  with  the  most  local  support  is 
consistent  with  human  psychophysics,  since  it  exhibits  several  of  the 
same  "illusions"  that  humans  perceive,  such  as  the  'Tjarbepole-",  the 
"non-rigidity-",  the  "motion-capture-"  and  the  "Wallach's  aperture- 
illusion". 
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Chapter  5 

Extracting  Complete  and  Unique 
Contours 


5.1  Introduction 

The  pointwise  output  of  the  motion  boimdary  estimators  is  often  broadly 
localized  and  it  can  contain  gaps.  Hence,  we  have  to  find  a  way  to  extract 
single  and  unique  boundaries  without  gaps.  We  apply  and  modify  the 
Structural  Saliency  Method  developed  by  Sha’ashua  &  UUman  [37,45]  to 
achieve  this  goal. 

Sha'ashua  and  UUman  have  proposed  two  different  kinds  of  saliency 
measures:  local  saliency  and  structural  saliency.  An  edge’s  local  saliency  is 
determined  by  attributes  of  that  edge  alone,  and  in  our  case  local  saliency 
is  equal  to  the  magnitude  of  the  output  of  the  motion  boundary 
estimators.  Structural  saliency  refers  to  more  global  properties  of  an  edge  - 
its  relationships  with  other  edges  -  and  often  this  saliency  is  a  property  of 
the  structure  as  a  whole,  whereas  the  parts  of  the  structure  are  not 
necessarily  salient  in  isolation. 

5.2  The  Structural  Saliency  Method 

The  Structural  Saliency  Method  employs  a  simple  iterative  network  and 
uses  an  optimization  approach  to  produce  a  "saliency  map",  which 
emphasizes  salient  locations  in  the  image.  The  saliency  of  curves  is 
measured  in  terms  of  their  smoothness  and  length,  which  is  often 
sufficient  to  perform  a  figure-ground  separation.  The  main  properties  of 
the  network  are:  (i)  the  computations  are  simple  and  local,  (ii)  globally 
salient  structures  emerge  with  a  small  number  of  iterations,  (iii)  there  is 
little  dependence  on  the  complexity  of  the  image,  (iv)  contours  are 
smoothed,  gaps  are  filled  in  and  linking  information  between  edge 
segments  is  provided. 


How  docs  it  woric? 
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5^1  Detailed  Description 

A  structural  saliency  measure  Q  is  computed  by  a  locally  connected 
network  of  processing  elements.  The  image  is  represented  by  a  network  of 
n  xn  grid  points,  where  each  point  represents  a  specific  (x,  y)  location  in 
the  image.  At  each  point  P  there  are  k  orientation  elements  coming  into  P 
from  neighboring  points,  and  the  same  number  of  orientation  elemente 
leaving  P  to  nearby  points  (in  the  current  implementation  k  is  equed  to 
16,  providing  a  reasonable  angular  resolution).  Each  orientation  element 
Pi  responds  to  the  output  of  the  motion  boundary  estimators  by  signalling 
the  presence  of  the  corresponding  motion  boimdary  in  the  image,  so  that 
those  elements  that  do  not  have  an  underlying  line  segment  are 
associated  with  an  empty  area  or  gap  in  the  image.  We  refer  to  a 
connected  sequence  of  orientation  elements  p,+i,  ...,  p,+„,  each  element 
representing  a  line  segment  or  a  gap  (called  a  virtual  element),  as  a  curve 
of  length  n.  The  optimization  problem  is  formidated  as  maximizing  Q(n) 
over  all  cxirves  of  length  n  starting  from  p,-. 

An  exhaustive  enumeration  of  all  combinations  of  pi+i, ...,  pi+n  would  require 
an  exponential  search  space  of  size  k^  for  each  element  in  the  network. 

The  computation  becomes  linear  in  n  if  we  use  an  extensible  function  Q 
to  measiire  saliency: 

max  Qn(pi, . .  pi+J  =  max  Di(pi,  rnax  Dn-i(pw, . .  p+n)) 

where  S°(pp  is  the  set  of  all  possible  curves  of  length  n  starting  from  p^. 

Hence,  the  maximal  curve  of  length  n  at  P  will  be  equal  to  the  maxima  over 
all  possible  segments  leaving  P  and  the  maximal  curves  of  lengfii 
(n-1)  starting  at  the  respective  end-points  of  these  segments. 

It  is  worth  noting  that  the  optimal  contour  through  P  does  not 
necessarily  extend  itself  as  the  iterations  proceed.  In  fact,  the  optimal 
curve  at  stage  n+1  can  be  different  from  the  optimal  curve  at  stage  n. 

Further,  the  saliency  measure  is  associated  with  each  element,  not  with 
the  entire  curve. 
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The  structxiral  salienqr  Ei  is  equal  to  the  weighted  contributions  of  the 
local  saliency  values  along  the  curve.  Each  weight  is  a  product  of  two 
factors.  The  first  factor  is  inversely  related  to  the  number  of  virtual 
elements  (i.e.  gaps)  along  pi,  ...,  pj,  and  the  second  factor  is  inversely 
related  to  the  total  curvature  of  the  curve.  Curves  that  have  a  high 
structural  saliency  value  are  long  curves  that  are  as  straight  as  possible 
and  have  the  least  number  of  gaps  (for  an  in-depth  description,  see 
Sha'ashua  1988  and  Sha'ashua  &  UUman  1989  [36,37,45]). 


•  Structunl 
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The  structural  saliency  Ei  is  updated  by  the  following  computation: 

E<<»  =  a, 

=  Ci  +  Pi  max  fi,j 

Pj€  S(|>0 

and  it  can  be  shown  by  induction  on  the  length  of  the  curve  that 

i+n 

=  X  Q,j  Pi,j 


where 


*  ^ktan 

=  n  fk,k+l  =  e  S' 


i 

n,  (  1  if  pk  is  active 

Pk,wherepk=^^^^^  _ 


5.2.2  Extending  the  Structural  Saliency  Method 


We  incorporate  the  motion  estimates  to  separate  boundary  segments 
belonging  to  differently  moving  objects.  The  three  points  that  constitute 
an  oriented  segment  have  each  a  motion  estimate  associated  with  them. 
We  allow  only  points  to  form  an  oriented  segment  whose  motion 
estimates  do  not  differ  by  more  than  two  displacement  units.  We  want  to 
prevent  contours  from  being  formed  that  wander  across  motion 
boundaries,  and  thereby  violate  the  constraint  that  a  flow  field  varies 
smoothly  along  a  boundary.  The  eHectiveness  of  this  constraint  hinges  on 
how  well  the  qualitative  aspects  of  the  motion  field  are  estimated. 
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5JL3  Extracting  a  Unique  Contour 

If  the  area  in  which  curves  are  allowed  to  form  is  broadly  defined,  then 
there  will  be  several  contours  growing  alongside  each  other,  as  is  the  case 
in  our  examples.  To  extract  the  most  salient  curve,  we  have  to  first 
propagate  the  structural  saliency  value  of  the  most  salient  segment  along 
the  curve  that  contributed  to  its  value,  because  the  saliency  measiu'e  is 
associated  with  each  element  and-  not  with  the  entire  curve.  The 
propagation  is  done  iteratively  by  each  segment  meodmizing  over  the 
value  of  its  preferred  neighbor  and  its  own  [Sha’ashua  in  prep.].  Thus,  the 
largest  value  will  be  propagated  along  its  curve. 

Finally,  we  perform  a  non-maximal  suppression  operation  [Sha'ashua 
in  prep.],  where  each  segment  suppresses  all  its  neighboring  segments  if 
their  structural  saliency  value  is  less  and  if  they  have  similar  motion 
estimates  associated  with  them.  Hence,  the  most  salient  contours 
belonging  to  differently  moving  objects  will  remain  alongside  each  other^ 
(see  Figure  6.7). 


^  At  the  locatioxis  where  the  differently  moving  objects  occltide  each  other,  there  will  be  two 
boundary  segments  extracted  that  lie  alongside  each  other,  but  where  one  of  them  is  an  artifact  of 
the  occlusion.  A  next  step  could  be  to  label  the  boundary  segments  that  lie  alongside  each  other 
so  that  they  receive  a  lower  priority  than  boundary  segments  that  do  not  have  a  boundary 
segment  belonging  to  another  object  close  by,  when  the  extracted  boundaries  are  the  input  to  a 
recognition  process. 
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In  this  chapter,  we  present  results,  where  the  developed  methods  have 
been  applied  to  motion  sequences  containing  several  moving  objects 
composed  of  either  random-dot  or  natural  textures.  The  methods  for 
estimating  the  motion  boundaries  have  been  implemented  on  the 
Connection  Machine,  a  massively  parallel  network  of  simple,  locally 
interconnected  processors  [16].  The  smoothed  intensity  values  are  used  as 
the  matching  primitives  and  the  histograms  of  the  potential 
displacements  are  used  as  the  input  representation. 

6.1  The  Estimation  of  Motion  Boundaries 

The  early  detection  of  motion  boundaries  is  performed  in  two  stages: 
(i)  the  local  estimation  of  the  motion  discontinuities;  (ii)  the  extraction  of 
complete  boundaries  belonging  to  differently  moving  objects. 

The  methods  for  estimating  the  motion  boundaries  make  use  of  the 
fact  that  the  potential  displacements  of  image  points  in  the  vicinity  of  a 
motion  boundary  will  cluster  around  two  different  points  in  a  local 
velocity  histogram.  The  local  histograms  are  constructed  at  every  point 
using  a  circular  neighborhood  with  a  radius  of  eight  pixels.  The  potential 
displacements  are  quantized  and  they  are  measured  in  terms  of  pixels. 

6.1.1  The  Bimodality  Tests  and  the  Bi-distribution  Test 

The  Bimodality  Tests,  consisting  of  the  peak-ratio,  local-support-ratio, 
signal-noise-ratio  and  the  chi-square  measure,  estimate  motion 
botmdaries  by  computing  the  degree  of  bimodality  present  in  the  local 
histograms  of  the  potential  displacements.  The  Bi-distribution  Test 
detects  boundaries  by  applying  the  Kolmogorov- Smirnov  Test  to 
measure  the  probability  that  two  histograms  have  been  created  by  the 
same  population  of  motions. 
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6.1.1.1  Complex  Dynamic  Random-Dot  Display 

Figure  6.1  shows  the  estimated  boundaries  for  a  complex  random-dot 
motion  sequence  which  contains  a  rotating  circle  and  rectangle,  and  a 
translating  square  in  the  image  plane.  The  first  row  displays  the  estimated 
boimdaries  using  thresholding.  The  second  row  displays  the  inferred 
boundaries  by  detecting  the  global  extrema  and  using  a  minimal 
threshold. 

Figure  6.1  Estimating  Motion  Boundaries  in  a  Complex  Dynamic 
Random-Dot  Display. 

Thresholding 


ptak-nao  sicnal-naue-niM  locat*suppoR*mio  cht-squm  Kotmogorov-SiiMfwrv 


Detecting  Global  Extrema 


For  the  above  example,  the  peak-ratio  and  the  signal-noise-ratio 
successfully  estimate  all  the  motion  boimdaries,  and  they  mark  very  few 
false  boundaries.  The  reason  these  two  measures  perform  so  well  is  that 
they  directly  measure  the  degree  of  bimodality  occurring  in  the  local 
histograms,  whereas  the  other  measures  do  it  indirectly.  The  local- 
support-ratio,  the  chi-square  measure  and  the  Kolmogorov-Smirnov 
measure  also  successfully  infer  where  motion  boundaries  are  present,  but 
they  mark  more  incorrect  boimdaries.  The  chi-square  measure  has  a  high 
false  alarm  rate  inside  the  two  rotating  objects,  because  the  highest  peak  is 
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broadly  defined  at  the  borders  between  regions  of  constant  displacement 
that  differ  only  by  one  displacement  unit.  The  Kolmogorov-Smirnov 
measure  has  a  high  false-rate  at  the  center  of  the  rotating  circle,  because 
any  two  histograms  that  are  being  compared  will  have  their  peaks  at 
different  locations. 

The  false  alarms  can  be  ruled  out  by  overlapping  the  thickened 
j  extrema  contours  of  several  of  the  measures,  because  these  measures 

I  have  a  global  extrema  at  a  motion  boundary,  whereas  their  local  extrema 

elsewhere  in  the  image  are  weakly  correlated  with  each  other.  Figure  6.2 
shows  the  results  of  intersecting  the  thickened  extrema  contours  of  the 
peak-ratio,  signal-noise-ratio  and  local-support-ratio  to  infer  the  motion 
boundaries,  (a),  (b)  and  (c)  display  the  intersections  of  the  extrema 
contours  thickened  by  one,  two  and  three  pixels,  respectively.  This 
approach  h2is  the  attractive  feature  that  it  does  not  require  the  setting  of  a 
threshold  and  it  can  be  used  to  rule  out  false  alarms.  Figure  6.2 
demonstrates  that  the  measures  are  highly  correlated  at  a  motion 
boundary,  whereas  elsewhere  in  the  image  they  are  weakly  correlated 
I  with  each  other. 


Figure  6.2  Intersecting  the  Extrema  Contours  of  the  Developed  Measures  to  Estimate 
Motion  Boundaries. 


(a)  thickened  by  one  pixel 


(b)  thickened  by  ttoo  pixels 


(c)  thidtened  by  three  pixels 
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6.1.1.2  Natural  Motion  Sequence 

Figure  6.3  (a)  shows  the  Canny  edges  of  the  Salisbiiry  Robot  Hand;  (b) 
displays  the  estimated  motion  boundaries  when  the  hand  is  lifting  the 
object  that  it  is  holding,  where  the  peak-ratio  has  been  thresholded  and  its 
output  has  been  suppressed  where  the  average  intensity  gradient  was  not 
sufficiently  large;  (c)  shows  the  detected  global  maxima  of  the  peak-ratio. 

Figure  63.  Estimating  Motion  Boundaries  in  a  Natural  Image  Sequence. 


(a)  Canny  Edges 


(c)  Global  maxima  of  peak-ratio 
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6.1^  Dynamic  Occlusion  Method 

Figure  6.4  (a)  shows  where  the  Dynamic  Occlusion  Method  estimated  the 
appearance  or  disappearance  of  thin-bars  in  a  random-dot  display  of  a 
translating  square.  This  method  gives  a  rough  sense  of  the  motion 
boundary,  although  it  does  not  provide  complete  boundaries,  (b)  Shows 
the  output  of  this  method  for  the  seime  motion  display  as  in  Figure  6.1. 
The  marked  locations  provide  a  sense  of  the  boimdaries  for  this  more 
complex  display,  although  there  are  false  alarms  in  the  rotating  regions. 


Figure  6.4  Estimating  Motion  Boundaries  using  the  Dynamic  Occlusion  Method. 


6.2  The  Estimation  of  Visual  Motion 

A  local  histogram  of  the  potential  displacements  has  its  highest  peak  at 
the  displacement  that  received  the  most  local  support.  Hence,  this 
displacement  represents  an  estimate  of  the  image  flow. 

62.1.  Complex  Dynamic  Random-Dot  Display 

The  first  panel  in  Figure  6.5  shows  the  estimated  image  flow  field  for  a 
complex  random-dot  motion  sequence  which  contains  a  rotating  circle 
and  rectangle,  and  a  translating  square.  The  second  panel  displays  the 
error  in  the  computed  flow  field.  As  expected,  the  error  is  largest  in  the 
vicinity  of  the  motion  boimdaries.  In  the  interior  of  the  rotating  objects, 
there  are  also  small  errors  at  the  borders  between  the  regions  of  constant 
displacement  that  differ  only  by  one  displacement  unit. 


48 


Results 


Figure  65  Estimating  the  Image  Flow  Field. 

The  first  panel  shows  the  estimated  image  flow  field  for  a  complex  random-dot  motion 
sequence  which  contains  a  rotating  drcle  and  rectangle,  and  a  translating  square.  The 
second  panel  displays  the  error  in  the  computed  flow  field.  As  expected,  the  error  is 
largest  in  the  vicinity  of  the  motion  boundaries.  In  the  interior  of  the  rotating  objects, 
there  are  also  smedl  errors  at  the  borders  between  the  regions  of  constant  displacement 
that  di^er  only  by  one  displacement  unit. 
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63  Extracting  Complete  &  Unique  Motion  Boundaries 

The  pointwise  output  of  the  motion  boundary  estimators  is  often  broadly 
localized  and  it  can  contain  gaps.  We  apply  and  modify  the  Structural 
Saliency  Method  developed  by  Sha'ashua  &  UUman  to  extract  single  and 
unique  boundaries  without  gaps. 

63.1.  Complex  Dynamic  Random-Dot  Display 

Figure  6.6  (a)  shows  the  estimated  motion  boundaries  for  a  random-dot 
motion  sequence  which  contains  a  translating  circle,  rectangle  and  square, 
where  the  peak-ratio  has  been  used  and  thresholded  to  provide  the 
estimate,  (b)  Displays  the  three  most  salient  structures  extracted  by  the 
Structural  Saliency  Method,  where  the  motion  estimates  are  used  to 
ensure  that  the  contours  do  not  wander  across  motion  boundaries.  The 
purpose  of  the  second  stage  is  to  extract  complete  boundaries  from  an 
input  that  can  be  noisy. 

Figure  6.6  Extracting  Complete  &  Unique  Motion  Boundaries. 
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7.1  Stereopsis 

Stereopsis  computes  relative  depth  by  using  the  differences/  also  called 
disparities,  in  the  projection  of  points  in  space  onto  the  two  eyes  or 
cameras,  which  view  the  scene  from  two  slightly  different  vantage  points. 
The  key  problem  of  stereopsis  is  how  to  match  points  in  the  two  images 
that  correspond  to  the  same  point  in  space.  This  correspondence  problem 
is  inherently  underdetermined  and  constraints  are  needed  to  solve  it.  As 
for  the  computation  of  the  image  flow  field,  the  assumption  is  typically 
made  that  the  surfaces  of  objects  are  generally  smooth,  i.e.,  that  the 
disparity  varies  smoothly  almost  everywhere  in  the  image.  This 
constraint  is  not  valid  across  depth  boundaries,  and  so  far,  most  stereo 
algorithms  not  only  do  not  directly  detect  discontinuities  m  depth  but 
also  perform  badly  precisely  at  these  locations  [10,46].  The  methods 
developed  for  the  early  detection  of  motion  boimdaries  are  relevant  to 
stereopsis  in  the  following  ways. 

First,  stereopsis  is  a  special  case  of  general  motion,  because  its  disparity 
fields  are  equivalent  to  image  flow  fields  created  by  a  restricted  class  of 
motions  and  all  the  motion  botmdaries  are  due  to  depth  discontinuities. 
Hence,  these  depth  boimdaiies  can  be  detected  by  the  methods  developed 
for  general  motion  at  a  stage  prior  to  the  depth  computation,  where  the 
two  images  do  not  need  to  be  registered. 

Second,  motion  boundaries  can  be  used  as  stereo  matching  features 
and  there  is  psychological  evidence  that  the  human  visual  system  is  able 
to  do  this  [21,22,321.  The  motion  boimdaries  can  be  matched  using  the 
ordering  constraint,  i.e.,  if  a  motion  discontinuity  is  to  the  left  of  another 
motion  discontinuity  in  the  left  image  then  this  ordering  will  be 
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preserved  in  the  right  image,  and  vice  versa.  Further,  the  figural 
continuity  and  edge  connectivity  constraints  can  be  applied,  because  the 
motion  boundaries  will  form  continuous  contours  [5,26]. 

Third,  the  detected  motion  and  depth  boundaries  can  be  used  as 
pointers  to  the  regions  in  the  two  images  that  do  not  possess  a  match  in 
the  other  image  due  to  occlusion.  In  particular,  these  occluded  regions 
will  always  be  to  the  right  (left)  of  a  depth  discontinuity  in  the  left  (right) 
image,  for  perspective  projection.  A  search  could  be  performed  in  the 
neighborhood  of  a  detected  motion  or  depth  boundary  to  determine  the 
extent  of  an  occluded  region.  Finally,  the  corresponding  points,  that  are 
visible  in  both  eyes,  could  be  then  matched  using  the  ordering  constraint, 
thereby  simplifying  the  correspondence  problem. 

Fourth,  a  stereo  algorithm  can  be  devised  that  simultaneously 
computes  depth  and  its  discontinuities,  because  the  highest  peak  in  the 
local  histogram  of  the  potential  disparities  estimates  the  disparity,  and  the 
depth  boundaries  can  be  inferred  where  the  peak-ratio,  for  example,  is 
close  to  one. 

To  summarize,  it  is  advantageous  to  detect  depth  or  motion 
boimdaries  prior  to  and  use  them  in  the  stereo  computation,  because  they 
make  explicit  where  the  smoothness  assumption  is  not  valid,  and  they 
could  be  used  to  simplify  the  correspondence  problem. 

7.2  Surface  Reconstruction 

In  most  models  of  stereopsis,  disparity  is  initially  computed  at  specific 
locations,  such  as  where  intensity  changes  sharply.  The  surface 
reconstruction  from  this  sparse  and  noisy  data  can  be  formulated  in  terms 
of  minimizing  an  energy  functional  [14,20,40].  In  particular,  the  surface 
reconstruction  should  be  performed  as  a  piecewise  smooth  interpolation 
to  accotmt  for  the  existence  of  several  surfaces  within  a  scene.  Without 
the  knowledge  of  the  locations  of  depth  discontinuities,  the  information 
about  the  shape  of  one  surface  can  affect  the  shape  of  an  adjacent  sxirface, 
i.e.,  the  surface  reconstruction  scheme  will  smooth  over  the  boimdaries. 
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Hence,  the  early  detection  of  depth  boundaries  is  of  special  importance 
because  it  makes  explicit  where  not  to  smoothly  interpolate  the  sparse 
depth  map.  The  methods  described  in  this  thesis  can  be  used  to  segment 
sparse  and  noisy  depth  maps  (see  Figure  7.1). 


Figure  7.1  Detecting  Boundaries  in  a  Sparse  Depth  Map. 

(a)  Shows  the  synthetic  depth  map  used  as  the  test  input.  The  depth  map  has  a  depth 
range  of  200  units.  The  resolution  is  reduced  by  a  factor  of  10,  because  the  depth  gradient  is 
too  large  and  changes  too  rapidly  over  the  spatial  support  used  to  construct  the  local 
histograms  of  the  depth  estimates,  (b)  Displays  the  depth  boundaries  detected  by 
thresholding  the  signal-noise-ratio,  where  the  sparseness  ot  the  data  is  10%  and  Gaussicui 
noise  has  be^  added. 


(a)  Synthetic  depth  map 


(b)  Signal-noise-ratio  thresholded 
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Chapter  8 

Summary  &  Conclusion 


This  thesis  has  shown,  firstly,  that  a  useful  segmentation  can  be 
performed  on  the  basis  of  motion  information  alone  at  an  early  stage  of 
visual  processing.  Secondly,  it  has  been  demonstrated  that  the  estimation 
of  motion  boimdaries  can  be  decoupled  from  the  computation  of  a  full 
image  flow  field  and  how  it  can  be  performed  in  parallel.  Thirdly,  this 
thesis  has  shown  how  to  integrate  the  pointwise  output  of  the  developed 
motion  boundary  estimators  with  a  process  that  can  extract  salient, 
complete  and  unique  contours,  where  contour  segments  belonging  to 
differently  moving  objects  are  separated  and  segments  belonging  to  the 
same  object  are  grouped  together.  The  detection  of  motion  boimdaries 
has  been  performed  in  two  stages:  (i)  the  local  estimation  of  the  motion 
discontinuities  and  of  the  visual  flow  field;  (ii)  the  extraction  of  complete 
boundaries  belonging  to  differently  moving  objects. 

8.1  The  First  Stage 

For  the  first  stage,  three  new  methods  have  been  presented  that  can 
independently  estimate  the  presence  and  location  of  motion  boimdaries: 
the  Bimodality  Tests,  the  Bi~distribution  Test,  and  the  Dynamic  Occlusion 
Method.  These  methods  require  only  local  computations.  They  have  been 
implemented  on  the  Connection  Machine,  a  parallel  network  of  simple, 
locally  interconnected  processors. 

The  Bimodality  Tests  and  the  Bi-distribution  Test  make  use  of  the  fact 
that  at  a  motion  boundary  certain  quantities,  which  can  be  easily 
computed  from  an  image  sequence,  will  cluster  around  two  different 
points  in  a  local  histogram.  The  quantities  in  question  are  (i)  the  potential 
displacements  of  an  image  point,  or  (ii)  the  flow  component  measured  in 
the  direction  of  the  intensity  gradient.  The  local  histograms  are 
constructed  at  every  point  using  a  circular  support,  whose  radius  ranges 
between  five  and  eight  pixels. 
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We  xise  a  Gaussian  matching  function,  which  depends  on  the 
difference  in  intensity  at  the  two  points  defining  a  displacement,  to 
compute  the  match  score  of  a  possible  displacement.  This  matching 
function  has  been  chosen  to  account  for  the  fact  that  the  intensity  values 
at  corresponding  points  can  change  due  to  noise  and  changes  in 
illumination.  Further,  we  use  the  magnitude  of  the  intensity  gradient  or 
its  local  average  to  suppress  false  alarms  in  regions  with  little  texhire. 

We  assume  that  the  image  flow  field  can  be  approximated  as  locally 
constant.  Hence  neighboring  points  will  have  a  potential  displacement  in 
common.  We  can  relax  this  assumption  by  using  an  Gaussian  spatial 
support  function  that  weighs  contributors  less  that  are  farther  away  from 
the  point  at  which  the  histogram  is  computed.  This  will  accoxmt  for  the 
fact  that  the  flow  vectors  at  points  farther  apart  are  less  likely  to  be  equal 
in  a  smoothly  var3dng  flow  field.  It  will  also  cause  the  response  of  the 
Ratio  measures  to  be  sharpened. 

The  Bimodality  Tests  consist  of  four  measures  that  monitor  the  degree 
of  bimodality  present  in  the  local  histograms  of  either  the  potential 
displacements  or  the  normal  flow  components.  The  peak-ratio,  the  local- 
support-ratio  and  the  signal-noise-ratio  can  be  computed  from  the  local 
histograms  directly,  and  each  of  them  captures  a  different  characteristic  of 
a  motion  boundary.  The  chi-square  measure  estimates  bimodality  by 
measuring  how  well  a  Gaussian  distribution  can  be  fitted  to  a  local 
histogram.  Of  these  four  measures,  the  peak-ratio  and  signal-noise-ratio 
estimate  motion  boundaries  most  accurately  and  reliably,  because  they 
directly  measure  the  degree  of  bimodality  present  in  the  local  histograms. 
It  was  also  foimd  that  more  than  one  of  these  measures  can  be  combined 
to  detect  boimdaries  and  to  rule  out  false  alarms  by  intersecting  the 
thickened  extrema  contours  of  several  of  these  measmes. 

The  Bi-distribution  Test,  which  uses  the  non-parametric  statistical 
Kolmogorov-Smimov  test,  can  compare  any  two  distributions.  But  this 
method  often  does  not  perform  as  well  as  the  Bimodality  Tests,  because 
the  local  histograms  used  in  the  detection  of  motion  boundaries  can  be 
sufficiently  different  even  for  nearby  points  belonging  to  the  same 
moving  object. 
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The  reason  why  we  have  developed  five  different  measures  is  because 
they  each  capture  and  monitor  a  different  characteristic  of  a  motion 
boundary.  We  have  shown  that  these  measures  have  a  global  extrema  at 
a  motion  boundary,  whereas  their  local  extrema  elsewhere  in  the  image 
are  weakly  correlated  with  each  other.  Thresholds  have  been  derived  for 
the  different  measures,  and  we  have  shown  how  to  use  thresholding  and 
the  detection  of  global  extrema  as  ways  to  infer  the  presence  of  motion 
boimdaries.  In  particular,  the  approach  that  combines  and  intersects  the 
thickened  extrema  contours  to  estimate  the  boimdaries  has  the  attractive 
feature  that  it  does  not  require  the  setting  of  a  threshold.  The  motion 
boundaries  are  inferred  by  corroborating  the  information  provided  by 
these  measures,  and  good  results  have  been  obtained. 

The  Dynamic  Occlusion  Method  uses  the  fact  that  thin-bars  are  created 
or  destroyed  at  a  motion  boundary.  Dynamic  occlusion  of  these  simple 
features  can  be  computed  locally  in  a  way  that  can  estimate  boundaries 
prior  to  the  computation  of  motion  without  having  to  solve  global 
correspondence. 

It  has  also  been  shown  that  the  visual  flow  field  can  be  locally 
estimated  as  a  by-product  of  the  early  estimation  of  motion  boundaries. 
The  highest  peak  in  a  local  histogram  of  the  potential  displacements 
estimates  the  local  image  flow.  The  measures  that  are  sensitive  to  degree 
of  bimodality  present  in  the  local  histograms  reflect  how  good  the 
estimate  is.  It  was  noted  that  the  developed  method  to  compute  visual 
motion  is  well-posed  and  that  it  is  similar  to  the  local  voting  scheme 
proposed  by  Biilthoff,  Little  &  Poggio  [7]. 

8.2  The  Second  Stage 

We  have  applied  and  modified  the  Structural  Saliency  Method 
developed  by  Sha'ashua  &  UUman  [37,45]  to  extract  complete  and  unique 
boundaries  from  the  pointwise  output  of  the  first  stage,  which  is  often 
broadly  defined  and  can  contain  gaps.  Boundary  segments  belonging  to 
differently  moving  objects  have  been  separated  by  using  the  motion 
estimates  provided  by  the  first  stage  to  constrain  which  edge  segments  can 
be  formed. 
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The  Structural  Saliency  Method  extracts  boundaries  and  closes  gaps  by 
emplo3nng  a  simple  iterative  scheme  that  uses  an  optimization  approach 
to  measure  the  saliency  of  curves  of  line  segments  in  terms  of  their 
smoothness  and  length.  The  optimization  problem  is  formulated  in 
terms  of  maximizing  a  structural  saliency  measure  Q(n)  over  all  curves 
of  length  n  starting  from  P. 

The  computation  is  linear  in  n  because  Q  has  been  chosen  to  be  an 
extensible  function.  Hence,  the  most  salient  curve  of  length  n  at  P  will  be 
equal  to  the  maxima  over  all  segments  leaving  P  and  the  maximal  curves 
of  length  (n-1)  shirting  at  the  respective  end-points  of  these  segments.  The 
saliency  measure  is  associated  with  each  segment  and  not  with  the  entire 
curve. 

Because  the  area  defined  by  the  first  stage  is  broadly  defined,  there  wiU 
be  several  contours  growing  alongside  each  other.  To  extract  the  most 
salient  curve,  we  propagate  the  saliency  value  of  the  most  salient 
segment  along  the  curve  that  contributed  to  its  value.  This  is  done 
iteratively  by  each  segment  maximizing  over  the  value  of  its  preferred 
neighbor  and  its  own.  Thus,  the  largest  value  is  propagated  along  its 
curve.  Finally,  we  perform  a  non-maximal  suppression  operation,  where 
each  segment  suppresses  all  its  neighboring  segments  if  their  saliency 
value  was  less  and  if  they  had  similar  motion  estimates  associated  with 
them.  Hence,  the  most  salient  contours  belonging  to  differently  moving 
objects  remain  alongside  each  other. 

Finally,  we  have  presented  results  that  show  that  the  developed 
methods  can  successfully  segment  scenes  with  several  independently 
moving  objects,  without  prior  knowledge  of  the  shape  and  motion  of  the 
objects.  We  have  also  shown  that  the  developed  methods  can  segment 
sparse  depth  maps. 
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